| Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Oct 4, 2024 | DecoderHallucination | CodeCode Available | 2 |
| TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention | Oct 7, 2024 | Position | CodeCode Available | 2 |
| Hammer: Robust Function-Calling for On-Device Language Models via Function Masking | Oct 6, 2024 | | CodeCode Available | 2 |
| SyllableLM: Learning Coarse Semantic Units for Speech Language Models | Oct 5, 2024 | ClusteringLanguage Modeling | CodeCode Available | 2 |
| UniMuMo: Unified Text, Music and Motion Generation | Oct 6, 2024 | DecoderMotion Generation | CodeCode Available | 2 |
| TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting | Oct 6, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 2 |
| A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models | Oct 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance | Sep 29, 2023 | Few-Shot LearningHeart Segmentation | CodeCode Available | 2 |
| CursorCore: Assist Programming through Aligning Anything | Oct 9, 2024 | Code Completion | CodeCode Available | 2 |
| Compositional Entailment Learning for Hyperbolic Vision-Language Models | Oct 9, 2024 | Language ModellingRepresentation Learning | CodeCode Available | 2 |
| Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Oct 9, 2024 | cross-modal alignmentVisual Question Answering | CodeCode Available | 2 |
| EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models | Oct 9, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Oct 8, 2024 | Mixture-of-ExpertsQuantization | CodeCode Available | 2 |
| SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | Oct 9, 2024 | | CodeCode Available | 2 |
| Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning | Feb 24, 2024 | ClassificationFine-Grained Image Recognition | CodeCode Available | 2 |
| LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction | Oct 9, 2024 | DecoderForm | CodeCode Available | 2 |
| Progressive Autoregressive Video Diffusion Models | Oct 10, 2024 | DenoisingVideo Denoising | CodeCode Available | 2 |
| IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera | Oct 10, 2024 | Motion EstimationNeRF | CodeCode Available | 2 |
| From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions | Oct 10, 2024 | Diversity | CodeCode Available | 2 |
| An Undetectable Watermark for Generative Image Models | Oct 9, 2024 | | CodeCode Available | 2 |
| From Cognition to Precognition: A Future-Aware Framework for Social Navigation | Sep 20, 2024 | Future predictionNavigate | CodeCode Available | 2 |
| VideoAgent: Self-Improving Video Generation | Oct 14, 2024 | HallucinationVideo Generation | CodeCode Available | 2 |
| Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Oct 14, 2024 | LLM JailbreakSafety Alignment | CodeCode Available | 2 |
| MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Oct 15, 2024 | Visual Question Answering | CodeCode Available | 2 |
| Evaluating Morphological Compositional Generalization in Large Language Models | Oct 16, 2024 | Text Generation | CodeCode Available | 2 |
| IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning | Oct 19, 2024 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| DM-Codec: Distilling Multimodal Representations for Speech Tokenization | Oct 19, 2024 | Self-Supervised LearningSpeech Tokenization | CodeCode Available | 2 |
| GPT or BERT: why not both? | Oct 31, 2024 | Causal Language ModelingLanguage Modeling | CodeCode Available | 2 |
| Model merging with SVD to tie the Knots | Oct 25, 2024 | model | CodeCode Available | 2 |
| SciPIP: An LLM-based Scientific Paper Idea Proposer | Oct 30, 2024 | Retrieval | CodeCode Available | 2 |
| Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting | Oct 31, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection | Nov 12, 2024 | Optical Flow EstimationOut-of-Distribution Detection | CodeCode Available | 2 |
| MetaOpenFOAM: an LLM-based multi-agent framework for CFD | Jul 31, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 2 |
| PyGen: A Collaborative Human-AI Approach to Python Package Creation | Nov 13, 2024 | Code Generation | CodeCode Available | 2 |
| Disentangling Memory and Reasoning Ability in Large Language Models | Nov 20, 2024 | Decision MakingRetrieval | CodeCode Available | 2 |
| MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective | Nov 21, 2024 | Image ComprehensionImage Generation | CodeCode Available | 2 |
| vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation | Nov 26, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 2 |
| TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models | Nov 27, 2024 | Garment ReconstructionImage Generation | CodeCode Available | 2 |
| TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting | Nov 29, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene Graphs | Nov 28, 2024 | Object | CodeCode Available | 2 |
| X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models | Dec 2, 2024 | Image GenerationIn-Context Learning | CodeCode Available | 2 |
| CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking | Dec 1, 2024 | Bug fixingCode Generation | CodeCode Available | 2 |
| FLAIR: VLM with Fine-grained Language-informed Image Representations | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario | Jan 17, 2025 | | CodeCode Available | 2 |
| SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation Learning | Dec 5, 2024 | Domain AdaptationDomain Generalization | CodeCode Available | 2 |
| Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Dec 5, 2024 | Image ComprehensionRepresentation Learning | CodeCode Available | 2 |
| JPC: Flexible Inference for Predictive Coding Networks in JAX | Dec 4, 2024 | | CodeCode Available | 2 |
| MESA: Effective Matching Redundancy Reduction by Semantic Area Segmentation | Aug 1, 2024 | Patch Matching | CodeCode Available | 2 |
| DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Dec 10, 2024 | AllAutonomous Driving | CodeCode Available | 2 |
| MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction | Dec 12, 2024 | 3D ReconstructionMotion Estimation | CodeCode Available | 2 |