| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 |
| SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | Oct 14, 2024 | DecoderGPU | CodeCode Available | 9 |
| Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment | Oct 12, 2024 | Language ModellingPhilosophy | CodeCode Available | 9 |
| TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training | Oct 9, 2024 | GPU | CodeCode Available | 9 |
| Depth Pro: Sharp Monocular Metric Depth in Less Than a Second | Oct 2, 2024 | Depth EstimationGPU | CodeCode Available | 9 |
| Moshi: a speech-text foundation model for real-time dialogue | Sep 17, 2024 | Action DetectionActivity Detection | CodeCode Available | 9 |
| Do Large Language Models Need a Content Delivery Network? | Sep 16, 2024 | In-Context Learning | CodeCode Available | 9 |
| Language agents achieve superhuman synthesis of scientific knowledge | Sep 10, 2024 | ArticlesInformation Retrieval | CodeCode Available | 9 |
| KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation | Sep 10, 2024 | Knowledge GraphsQuestion Answering | CodeCode Available | 9 |
| General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model | Sep 3, 2024 | DecoderMath | CodeCode Available | 9 |
| MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer | Sep 1, 2024 | Self-Supervised Learningtext-to-speech | CodeCode Available | 9 |
| CogVLM2: Visual Language Models for Image and Video Understanding | Aug 29, 2024 | MM-VetMVBench | CodeCode Available | 9 |
| Sapiens: Foundation for Human Vision Models | Aug 22, 2024 | 2D Human Pose Estimation2D Pose Estimation | CodeCode Available | 9 |
| Transformer Explainer: Interactive Learning of Text-Generative Models | Aug 8, 2024 | | CodeCode Available | 9 |
| SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection | Aug 6, 2024 | Anomaly DetectionDefect Detection | CodeCode Available | 9 |
| MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | Jul 29, 2024 | 2D Semantic Segmentation task 1 (8 classes)graph construction | CodeCode Available | 9 |
| NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context? | Jul 16, 2024 | 4k8k | CodeCode Available | 9 |
| MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Jul 2, 2024 | GPULanguage Modelling | CodeCode Available | 9 |
| Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion | Jul 1, 2024 | Decision MakingPrediction | CodeCode Available | 9 |
| Symbolic Learning Enables Self-Evolving Agents | Jun 26, 2024 | | CodeCode Available | 9 |
| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 |
| Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation | Jun 17, 2024 | | CodeCode Available | 9 |
| garak: A Framework for Security Probing Large Language Models | Jun 16, 2024 | Red Teaming | CodeCode Available | 9 |
| BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack | Jun 14, 2024 | Question AnsweringRetrieval-augmented Generation | CodeCode Available | 9 |
| Depth Anything V2 | Jun 13, 2024 | Depth EstimationDiversity | CodeCode Available | 9 |
| Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Jun 13, 2024 | DiversityImage Animation | CodeCode Available | 9 |
| OpenVLA: An Open-Source Vision-Language-Action Model | Jun 13, 2024 | Imitation LearningLanguage Modelling | CodeCode Available | 9 |
| Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | Jun 10, 2024 | Mixture-of-Experts | CodeCode Available | 9 |
| PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Jun 10, 2024 | CPULanguage Modeling | CodeCode Available | 9 |
| LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model | Jun 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Jun 5, 2024 | Decoderobject-detection | CodeCode Available | 9 |
| Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | Jun 3, 2024 | | CodeCode Available | 9 |
| CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion | May 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models | May 23, 2024 | AI AgentDecision Making | CodeCode Available | 9 |
| (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | May 20, 2024 | Machine TranslationTranslation | CodeCode Available | 9 |
| DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | May 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation | May 2, 2024 | motion predictionStory Generation | CodeCode Available | 9 |
| OpenELM: An Efficient Language Model Family with Open Training and Inference Framework | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| Visually Descriptive Language Model for Vector Graphics Reasoning | Apr 9, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 9 |
| MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | Apr 9, 2024 | Domain Adaptation | CodeCode Available | 9 |
| RULER: What's the Real Context Size of Your Long-Context Language Models? | Apr 9, 2024 | Long-Context Understanding | CodeCode Available | 9 |
| Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction | Apr 3, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 9 |
| Model Stock: All we need is just a few fine-tuned models | Mar 28, 2024 | All | CodeCode Available | 9 |
| AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation | Mar 26, 2024 | DiversityFace Reenactment | CodeCode Available | 9 |
| InternLM2 Technical Report | Mar 26, 2024 | 4kLong-Context Understanding | CodeCode Available | 9 |
| LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | Mar 26, 2024 | GPUGSM8K | CodeCode Available | 9 |
| VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild | Mar 25, 2024 | DecoderLanguage Modeling | CodeCode Available | 9 |
| Arcee's MergeKit: A Toolkit for Merging Large Language Models | Mar 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| When Do We Not Need Larger Vision Models? | Mar 19, 2024 | Depth Estimation | CodeCode Available | 9 |