| Flow Matching Guide and Code | Dec 9, 2024 | Text Generation | CodeCode Available | 7 |
| NVILA: Efficient Frontier Visual Language Models | Dec 5, 2024 | Video Question Answering | CodeCode Available | 7 |
| GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Dec 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 |
| The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning | Nov 30, 2024 | | CodeCode Available | 7 |
| Efficient Track Anything | Nov 28, 2024 | ObjectSegmentation | CodeCode Available | 7 |
| FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Nov 27, 2024 | FairnessGPU | CodeCode Available | 7 |
| Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 |
| X-MeshGraphNet: Scalable Multi-Scale Graph Neural Networks for Physics Simulation | Nov 26, 2024 | | CodeCode Available | 7 |
| O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? | Nov 25, 2024 | HallucinationKnowledge Distillation | CodeCode Available | 7 |
| Tulu 3: Pushing Frontiers in Open Language Model Post-Training | Nov 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| RedPajama: an Open Dataset for Training Large Language Models | Nov 19, 2024 | | CodeCode Available | 7 |
| OASIS: Open Agent Social Interaction Simulations with One Million Agents | Nov 18, 2024 | Large Language ModelRecommendation Systems | CodeCode Available | 7 |
| SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization | Nov 17, 2024 | Image GenerationQuantization | CodeCode Available | 7 |
| LLaVA-CoT: Let Vision Language Models Reason Step-by-Step | Nov 15, 2024 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 7 |
| Zero-shot Voice Conversion with Diffusion Transformers | Nov 15, 2024 | In-Context LearningVoice Conversion | CodeCode Available | 7 |
| EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation | Nov 15, 2024 | Audio-Driven Body AnimationHuman Animation | CodeCode Available | 7 |
| MagicQuill: An Intelligent Interactive Image Editing System | Nov 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Measuring short-form factuality in large language models | Nov 7, 2024 | Form | CodeCode Available | 7 |
| xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism | Nov 4, 2024 | GPU | CodeCode Available | 7 |
| CALE: Continuous Arcade Learning Environment | Oct 31, 2024 | Atari GamesBenchmarking | CodeCode Available | 7 |
| In-Context LoRA for Diffusion Transformers | Oct 31, 2024 | Image Generation | CodeCode Available | 7 |
| AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline | Oct 28, 2024 | RAGRetrieval | CodeCode Available | 7 |
| ThunderKittens: Simple, Fast, and Adorable AI Kernels | Oct 27, 2024 | GPUState Space Models | CodeCode Available | 7 |
| Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data | Oct 24, 2024 | Image GenerationQuestion Generation | CodeCode Available | 7 |
| AutoTrain: No-code training for state-of-the-art models | Oct 21, 2024 | Classificationimage-classification | CodeCode Available | 7 |
| Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant | Oct 20, 2024 | Question Answeringspeech-recognition | CodeCode Available | 7 |
| D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement | Oct 17, 2024 | GPUReal-Time Object Detection | CodeCode Available | 7 |
| aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing | Oct 17, 2024 | AttributeCode Completion | CodeCode Available | 7 |
| Gravity-aligned Rotation Averaging with Circular Regression | Oct 16, 2024 | Mixed Realityregression | CodeCode Available | 7 |
| DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | Oct 16, 2024 | | CodeCode Available | 7 |
| CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos | Oct 15, 2024 | Point Tracking | CodeCode Available | 7 |
| AFlow: Automating Agentic Workflow Generation | Oct 14, 2024 | Code Generation | CodeCode Available | 7 |
| Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Oct 10, 2024 | 4kImage Animation | CodeCode Available | 7 |
| O1 Replication Journey: A Strategic Progress Report -- Part 1 | Oct 8, 2024 | Mathscientific discovery | CodeCode Available | 7 |
| Pyramidal Flow Matching for Efficient Video Generative Modeling | Oct 8, 2024 | GPUText-to-Video Generation | CodeCode Available | 7 |
| SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? | Oct 4, 2024 | Data Visualization | CodeCode Available | 7 |
| SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | Oct 3, 2024 | Image GenerationQuantization | CodeCode Available | 7 |
| Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models | Oct 3, 2024 | | CodeCode Available | 7 |
| PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System | Oct 1, 2024 | Red Teaming | CodeCode Available | 7 |
| ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI | Oct 1, 2024 | GPUImitation Learning | CodeCode Available | 7 |
| OmniGen: Unified Image Generation | Sep 17, 2024 | Edge DetectionImage Generation | CodeCode Available | 7 |
| LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Sep 10, 2024 | | CodeCode Available | 7 |
| gsplat: An Open-Source Library for Gaussian Splatting | Sep 10, 2024 | | CodeCode Available | 7 |
| MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery | Sep 9, 2024 | MemorizationQuestion Answering | CodeCode Available | 7 |
| Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | Aug 29, 2024 | Speech Synthesis | CodeCode Available | 7 |
| FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry | Aug 26, 2024 | NeRFState Estimation | CodeCode Available | 7 |
| Real-Time Video Generation with Pyramid Attention Broadcast | Aug 22, 2024 | Video Generation | CodeCode Available | 7 |
| FourierKAN outperforms MLP on Text Classification Head Fine-tuning | Aug 16, 2024 | ClassificationKolmogorov-Arnold Networks | CodeCode Available | 7 |
| VITA: Towards Open-Source Interactive Omni Multimodal LLM | Aug 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models | Aug 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |