| VISA: Reasoning Video Object Segmentation via Large Language Models | Jul 16, 2024 | DecoderObject | CodeCode Available | 3 | 5 |
| Scaling Retrieval-Based Language Models with a Trillion-Token Datastore | Jul 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Compact Language Models via Pruning and Knowledge Distillation | Jul 19, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 3 | 5 |
| PyABSA: A Modularized Framework for Reproducible Aspect-based Sentiment Analysis | Aug 2, 2022 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 3 | 5 |
| Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection | Jul 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 | 5 |
| MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine | Aug 6, 2024 | Medical Visual Question AnsweringOrgan Detection | CodeCode Available | 3 | 5 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 | 5 |
| NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices | Aug 19, 2024 | Optical Flow Estimation | CodeCode Available | 3 | 5 |
| ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | Aug 16, 2024 | GPUModel Compression | CodeCode Available | 3 | 5 |
| LoopSplat: Loop Closure by Registering 3D Gaussian Splats | Aug 19, 2024 | 3DGSPoint Cloud Registration | CodeCode Available | 3 | 5 |
| Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Aug 19, 2024 | Image GenerationVideo Generation | CodeCode Available | 3 | 5 |
| AnyGraph: Graph Foundation Model in the Wild | Aug 20, 2024 | Graph LearningMixture-of-Experts | CodeCode Available | 3 | 5 |
| LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs | Aug 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt | Sep 19, 2024 | 3DGSGPU | CodeCode Available | 3 | 5 |
| PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions | Sep 23, 2024 | Image GenerationImage Restoration | CodeCode Available | 3 | 5 |
| TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control | Sep 24, 2024 | ClusteringLanguage Modelling | CodeCode Available | 3 | 5 |
| Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts | Sep 25, 2024 | CAD ReconstructionText to 3D | CodeCode Available | 3 | 5 |
| ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation | Sep 20, 2024 | DescriptiveQuestion Answering | CodeCode Available | 3 | 5 |
| Results of the Big ANN: NeurIPS'23 competition | Sep 25, 2024 | Diversity | CodeCode Available | 3 | 5 |
| Diffusion Models are Evolutionary Algorithms | Oct 3, 2024 | DenoisingEvolutionary Algorithms | CodeCode Available | 3 | 5 |
| ControlAR: Controllable Image Generation with Autoregressive Models | Oct 3, 2024 | Image Generation | CodeCode Available | 3 | 5 |
| CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control | Oct 4, 2024 | Motion GenerationReinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Oct 17, 2024 | Talking Head GenerationVideo Generation | CodeCode Available | 3 | 5 |
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models | Oct 23, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 3 | 5 |
| ZipNN: Lossless Compression for AI Models | Nov 7, 2024 | Model Compression | CodeCode Available | 3 | 5 |
| TEXGen: a Generative Diffusion Model for Mesh Textures | Nov 22, 2024 | modelTexture Synthesis | CodeCode Available | 3 | 5 |
| BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence | Nov 22, 2024 | 3D visual groundingVisual Grounding | CodeCode Available | 3 | 5 |
| Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models | Nov 29, 2024 | Decision MakingRAG | CodeCode Available | 3 | 5 |
| TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation | Dec 4, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 3 | 5 |
| TryOffAnyone: Tiled Cloth Generation from a Dressed Person | Dec 11, 2024 | Image-to-Image TranslationVirtual Try-Off | CodeCode Available | 3 | 5 |
| InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders | Nov 13, 2024 | | CodeCode Available | 3 | 5 |
| Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey | Dec 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM | Dec 31, 2024 | ObjectVideo Understanding | CodeCode Available | 3 | 5 |
| Lifelong Learning of Large Language Model based Agents: A Roadmap | Jan 13, 2025 | Incremental LearningLanguage Modeling | CodeCode Available | 3 | 5 |
| Learning Getting-Up Policies for Real-World Humanoid Robots | Feb 17, 2025 | | CodeCode Available | 3 | 5 |
| TokenSkip: Controllable Chain-of-Thought Compression in LLMs | Feb 17, 2025 | GSM8K | CodeCode Available | 3 | 5 |
| Attention Distillation: A Unified Approach to Visual Characteristics Transfer | Feb 27, 2025 | DenoisingImage Generation | CodeCode Available | 3 | 5 |
| Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation | Mar 3, 2025 | 3D Generation3D Reconstruction | CodeCode Available | 3 | 5 |
| SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing | Mar 6, 2025 | ArticlesSurvey | CodeCode Available | 3 | 5 |
| Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text | Jun 25, 2024 | 3D GenerationDenoising | CodeCode Available | 3 | 5 |
| Unleashing Vecset Diffusion Model for Fast Shape Generation | Mar 20, 2025 | 3D Generation3D Shape Generation | CodeCode Available | 3 | 5 |
| HyperGraphRAG: Retrieval-Augmented Generation with Hypergraph-Structured Knowledge Representation | Mar 27, 2025 | RAGRetrieval | CodeCode Available | 3 | 5 |
| End-to-End Driving with Online Trajectory Evaluation via BEV World Model | Apr 2, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 3 | 5 |
| Motion Representations for Articulated Animation | Apr 22, 2021 | ObjectVideo Reconstruction | CodeCode Available | 3 | 5 |
| Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models | May 1, 2025 | Large Language Model | CodeCode Available | 3 | 5 |
| Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions | May 1, 2025 | Survey | CodeCode Available | 3 | 5 |
| Parallel Scaling Law for Language Models | May 15, 2025 | | CodeCode Available | 3 | 5 |
| Visual Planning: Let's Think Only with Images | May 16, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 | 5 |
| VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction | May 26, 2025 | 3D ReconstructionSpatial Reasoning | CodeCode Available | 3 | 5 |
| HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning | Apr 30, 2024 | parameter-efficient fine-tuning | CodeCode Available | 3 | 5 |