| PowerPM: Foundation Model for Power Systems | Aug 7, 2024 | Contrastive Learningmodel | CodeCode Available | 7 |
| CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Aug 7, 2024 | HumanEvalmbpp | CodeCode Available | 7 |
| Segment Anything in Medical Images and Videos: Benchmark and Deployment | Aug 6, 2024 | BenchmarkingSegmentation | CodeCode Available | 7 |
| Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | Aug 5, 2024 | DecoderDepth Estimation | CodeCode Available | 7 |
| Global Structure-from-Motion Revisited | Jul 29, 2024 | 16k | CodeCode Available | 7 |
| RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer | Jul 24, 2024 | Data AugmentationDecoder | CodeCode Available | 7 |
| ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness? | Jul 19, 2024 | BenchmarkingCode Generation | CodeCode Available | 7 |
| Stable Audio Open | Jul 19, 2024 | Audio GenerationText-to-Music Generation | CodeCode Available | 7 |
| MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains | Jul 18, 2024 | | CodeCode Available | 7 |
| Qwen2-Audio Technical Report | Jul 15, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 7 |
| EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions | Jul 11, 2024 | Image Animation | CodeCode Available | 7 |
| LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models | Jul 10, 2024 | Video Question AnsweringZero-Shot Video Question Answer | CodeCode Available | 7 |
| MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Jul 10, 2024 | Image ClassificationInstance Segmentation | CodeCode Available | 7 |
| PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods | Jul 9, 2024 | Information RetrievalLEMMA | CodeCode Available | 7 |
| Agentless: Demystifying LLM-based Software Engineering Agents | Jul 1, 2024 | Program Repair | CodeCode Available | 7 |
| ColPali: Efficient Document Retrieval with Vision Language Models | Jun 27, 2024 | document understandingRAG | CodeCode Available | 7 |
| RouteLLM: Learning to Route LLMs with Preference Data | Jun 26, 2024 | Data AugmentationTransfer Learning | CodeCode Available | 7 |
| BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO | Jun 25, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 7 |
| Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | Jun 24, 2024 | CPUGPU | CodeCode Available | 7 |
| EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees | Jun 24, 2024 | | CodeCode Available | 7 |
| Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation | Jun 24, 2024 | parameter-efficient fine-tuningSentence | CodeCode Available | 7 |
| NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking | Jun 21, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 7 |
| Grants4Companies: Applying Declarative Methods for Recommending and Reasoning About Business Grants in the Austrian Public Administration (System Description) | Jun 21, 2024 | | CodeCode Available | 7 |
| DataComp-LM: In search of the next generation of training sets for language models | Jun 17, 2024 | Language ModellingMMLU | CodeCode Available | 7 |
| Grounding Image Matching in 3D with MASt3R | Jun 14, 2024 | 3D Reconstruction | CodeCode Available | 7 |
| MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers | Jun 14, 2024 | Decoder | CodeCode Available | 7 |
| Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 7 |
| TextGrad: Automatic "Differentiation" via Text | Jun 11, 2024 | Question AnsweringSpecificity | CodeCode Available | 7 |
| Mixture-of-Agents Enhances Large Language Model Capabilities | Jun 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| M&M VTO: Multi-Garment Virtual Try-On and Editing | Jun 6, 2024 | DenoisingSuper-Resolution | CodeCode Available | 7 |
| The Prompt Report: A Systematic Survey of Prompting Techniques | Jun 6, 2024 | Prompt EngineeringSurvey | CodeCode Available | 7 |
| Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT | Jun 5, 2024 | Image GenerationPoint Cloud Generation | CodeCode Available | 7 |
| Scalable MatMul-free Language Modeling | Jun 4, 2024 | GPULanguage Modeling | CodeCode Available | 7 |
| Seed-TTS: A Family of High-Quality Versatile Speech Generation Models | Jun 4, 2024 | In-Context LearningLanguage Modelling | CodeCode Available | 7 |
| The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding | Jun 4, 2024 | | CodeCode Available | 7 |
| Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image | May 30, 2024 | Image to 3DSingle-View 3D Reconstruction | CodeCode Available | 7 |
| TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI | May 29, 2024 | MRI segmentation | CodeCode Available | 7 |
| Adaptive In-conversation Team Building for Language Model Agents | May 29, 2024 | DiversityLanguage Modeling | CodeCode Available | 7 |
| EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture | May 29, 2024 | Image GenerationVideo Generation | CodeCode Available | 7 |
| PromptWizard: Task-Aware Prompt Optimization Framework | May 28, 2024 | Computational EfficiencyDiversity | CodeCode Available | 7 |
| Efficient multi-prompt evaluation of LLMs | May 27, 2024 | MMLU | CodeCode Available | 7 |
| Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability | May 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 7 |
| The Road Less Scheduled | May 24, 2024 | Scheduling | CodeCode Available | 7 |
| HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | May 23, 2024 | HippocampusKnowledge Graphs | CodeCode Available | 7 |
| Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training | May 23, 2024 | GSM8KMixture-of-Experts | CodeCode Available | 7 |
| Learning Multi-dimensional Human Preference for Text-to-Image Generation | May 23, 2024 | Image GenerationText to Image Generation | CodeCode Available | 7 |
| Dynamic data sampler for cross-language transfer learning in large language models | May 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection | May 16, 2024 | Edge-computingFew-Shot Object Detection | CodeCode Available | 7 |
| Chameleon: Mixed-Modal Early-Fusion Foundation Models | May 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 7 |
| When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | May 16, 2024 | In-Context LearningQuestion Answering | CodeCode Available | 7 |