| Progressive Pretext Task Learning for Human Trajectory Prediction | Jul 16, 2024 | Knowledge DistillationPrediction | CodeCode Available | 2 |
| Scientific QA System with Verifiable Answers | Jul 16, 2024 | ArticlesInformation Retrieval | CodeCode Available | 2 |
| Digital Twin Vehicular Edge Computing Network: Task Offloading and Resource Allocation | Jul 16, 2024 | Edge-computingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes | Jul 16, 2024 | Human Instance SegmentationInstance Segmentation | CodeCode Available | 2 |
| Does Refusal Training in LLMs Generalize to the Past Tense? | Jul 16, 2024 | | CodeCode Available | 2 |
| SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions | Jul 16, 2024 | In-Context LearningKnowledge Base Question Answering | CodeCode Available | 2 |
| Monocular Occupancy Prediction for Scalable Indoor Scenes | Jul 16, 2024 | 3D Semantic Scene Completion from a single RGB imagePrediction | CodeCode Available | 2 |
| TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs | Jul 16, 2024 | Surface Reconstruction | CodeCode Available | 2 |
| Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation | Jul 15, 2024 | | CodeCode Available | 2 |
| Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems | Jul 15, 2024 | 3D ReconstructionMeta-Learning | CodeCode Available | 2 |
| IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Jul 15, 2024 | DenoisingDepth Estimation | CodeCode Available | 2 |
| OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection | Jul 15, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems | Jul 15, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Differentiable Voxelization and Mesh Morphing | Jul 15, 2024 | GPU | CodeCode Available | 2 |
| Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation | Jul 15, 2024 | Information RetrievalKnowledge Graphs | CodeCode Available | 2 |
| FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets | Jul 15, 2024 | ArticlesKnowledge Graphs | CodeCode Available | 2 |
| Representation Learning and Identity Adversarial Training for Facial Behavior Understanding | Jul 15, 2024 | Facial Action Unit DetectionFacial Expression Recognition (FER) | CodeCode Available | 2 |
| DataDream: Few-shot Guided Dataset Generation | Jul 15, 2024 | ClassificationDataset Generation | CodeCode Available | 2 |
| AccDiffusion: An Accurate Method for Higher-Resolution Image Generation | Jul 15, 2024 | Image GenerationObject | CodeCode Available | 2 |
| iHuman: Instant Animatable Digital Humans From Monocular Videos | Jul 15, 2024 | 3D geometry3D Reconstruction | CodeCode Available | 2 |
| From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients | Jul 15, 2024 | GPU | CodeCode Available | 2 |
| Accessing Vision Foundation Models at ImageNet-level Costs | Jul 15, 2024 | Knowledge DistillationTransfer Learning | CodeCode Available | 2 |
| Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | Jul 15, 2024 | Code Generation | CodeCode Available | 2 |
| PolyRoom: Room-aware Transformer for Floorplan Reconstruction | Jul 15, 2024 | | CodeCode Available | 2 |
| Target conversation extraction: Source separation using turn-taking dynamics | Jul 15, 2024 | | CodeCode Available | 2 |
| Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | Jul 15, 2024 | In-Context Learning | CodeCode Available | 2 |
| SEED: A Simple and Effective 3D DETR in Point Clouds | Jul 15, 2024 | | CodeCode Available | 2 |
| TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation | Jul 14, 2024 | Computational EfficiencyPrompt Engineering | CodeCode Available | 2 |
| xLSTMTime : Long-term Time Series Forecasting With xLSTM | Jul 14, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| AutoGRAMS: Autonomous Graphical Agent Modeling Software | Jul 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset | Jul 14, 2024 | 3D Object DetectionMultispectral Object Detection | CodeCode Available | 2 |
| Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models | Jul 14, 2024 | Anomaly DetectionVideo Anomaly Detection | CodeCode Available | 2 |
| Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models | Jul 14, 2024 | DenoisingVideo Enhancement | CodeCode Available | 2 |
| PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration | Jul 14, 2024 | Inductive BiasPoint Cloud Registration | CodeCode Available | 2 |
| Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV | Jul 14, 2024 | DenoisingImage Denoising | CodeCode Available | 2 |
| Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers | Jul 13, 2024 | MambaState Space Models | CodeCode Available | 2 |
| Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors | Jul 13, 2024 | Super-ResolutionVideo Super-Resolution | CodeCode Available | 2 |
| Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis | Jul 13, 2024 | Mambaspeech-recognition | CodeCode Available | 2 |
| An Autonomous GIS Agent Framework for Geospatial Data Retrieval | Jul 13, 2024 | Retrieval | CodeCode Available | 2 |
| DiffRect: Latent Diffusion Label Rectification for Semi-supervised Medical Image Segmentation | Jul 13, 2024 | DenoisingImage Segmentation | CodeCode Available | 2 |
| Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation | Jul 13, 2024 | Image Compression | CodeCode Available | 2 |
| SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | Jul 12, 2024 | ArticlesQuestion Answering | CodeCode Available | 2 |
| Flash normalization: fast RMSNorm for LLMs | Jul 12, 2024 | | CodeCode Available | 2 |
| GOFA: A Generative One-For-All Model for Joint Graph Language Modeling | Jul 12, 2024 | AllLanguage Modeling | CodeCode Available | 2 |
| Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba | Jul 12, 2024 | 3D Hand Pose EstimationMamba | CodeCode Available | 2 |
| Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | Jul 12, 2024 | Position | CodeCode Available | 2 |
| PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Jul 12, 2024 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| SpreadsheetLLM: Encoding Spreadsheets for Large Language Models | Jul 12, 2024 | In-Context LearningTable Detection | CodeCode Available | 2 |
| PID: Physics-Informed Diffusion Model for Infrared Image Generation | Jul 12, 2024 | Image Generation | CodeCode Available | 2 |
| GTA: A Benchmark for General Tool Agents | Jul 11, 2024 | | CodeCode Available | 2 |