| Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching | Oct 8, 2024 | Data AugmentationStyle Transfer | CodeCode Available | 4 |
| Fully Open Source Moxin-7B Technical Report | Dec 8, 2024 | | CodeCode Available | 4 |
| The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence | Dec 24, 2024 | Continual Learning | CodeCode Available | 4 |
| GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Jan 2, 2025 | Scene Understandingtext annotation | CodeCode Available | 4 |
| VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary | Mar 12, 2025 | EgoSchemaRetrieval | CodeCode Available | 4 |
| Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Mar 13, 2025 | Domain GeneralizationMath | CodeCode Available | 4 |
| Kornia-rs: A Low-Level 3D Computer Vision Library In Rust | May 18, 2025 | | CodeCode Available | 4 |
| Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image | Jul 20, 2023 | Depth EstimationImage Reconstruction | CodeCode Available | 4 |
| DeepFaceLab: Integrated, flexible and extensible face-swapping framework | May 12, 2020 | Face Swapping | CodeCode Available | 4 |
| PromptFix: You Prompt and We Fix the Photo | May 27, 2024 | DenoisingImage Generation | CodeCode Available | 4 |
| Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series | Jan 8, 2024 | CPUFew-Shot Learning | CodeCode Available | 4 |
| A Survey on Deep Stereo Matching in the Twenties | Jul 10, 2024 | Stereo MatchingSurvey | CodeCode Available | 4 |
| EdgeTAM: On-Device Track Anything Model | Jan 13, 2025 | modelVideo Segmentation | CodeCode Available | 4 |
| Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Aug 1, 2024 | Medical Question AnsweringMedQA | CodeCode Available | 4 |
| Towards Real-World Blind Face Restoration with Generative Facial Prior | Jan 11, 2021 | Blind Face RestorationVideo Super-Resolution | CodeCode Available | 4 |
| HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge | Apr 14, 2023 | model | CodeCode Available | 4 |
| DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors | Oct 18, 2023 | Image Animation | CodeCode Available | 4 |
| "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models | Aug 7, 2023 | Community Detection | CodeCode Available | 4 |
| Benchopt: Reproducible, efficient and collaborative optimization benchmarks | Jun 27, 2022 | Benchmarkingimage-classification | CodeCode Available | 4 |
| Couler: Unified Machine Learning Workflow Optimization in Cloud | Mar 12, 2024 | CPU | CodeCode Available | 4 |
| N-Grammer: Augmenting Transformers with latent n-grams | Jul 13, 2022 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 4 |
| Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Oct 29, 2024 | Autonomous DrivingScene Understanding | CodeCode Available | 4 |
| SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | Aug 20, 2024 | Emotion RecognitionMultimodal Emotion Recognition | CodeCode Available | 4 |
| SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation | Mar 13, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 |
| AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System | Feb 23, 2024 | AI Agent | CodeCode Available | 4 |
| HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | Sep 19, 2024 | Large Language ModelRecommendation Systems | CodeCode Available | 4 |
| EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations | Oct 14, 2024 | Answer GenerationQuestion Answering | CodeCode Available | 4 |
| AutoWebGLM: A Large Language Model-based Web Navigating Agent | Apr 4, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 4 |
| Prompt-to-Prompt Image Editing with Cross Attention Control | Aug 2, 2022 | Image GenerationText-based Image Editing | CodeCode Available | 4 |
| QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks | Feb 6, 2024 | Quantization | CodeCode Available | 4 |
| Differential Privacy: What is all the noise about? | May 19, 2022 | AllFederated Learning | CodeCode Available | 4 |
| Gated Delta Networks: Improving Mamba2 with Delta Rule | Dec 9, 2024 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 4 |
| Diffusion Models: A Comprehensive Survey of Methods and Applications | Sep 2, 2022 | Image GenerationImage Super-Resolution | CodeCode Available | 4 |
| mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video | Feb 1, 2023 | Action ClassificationImage Classification | CodeCode Available | 4 |
| Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | May 22, 2020 | Fact VerificationQuestion Answering | CodeCode Available | 4 |
| LettuceDetect: A Hallucination Detection Framework for RAG Applications | Feb 24, 2025 | 8kGPU | CodeCode Available | 4 |
| InternVideo: General Video Foundation Models via Generative and Discriminative Learning | Dec 6, 2022 | Action ClassificationAction Recognition | CodeCode Available | 4 |
| Optimizing Prompts for Text-to-Image Generation | Dec 19, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| DAMO-YOLO : A Report on Real-Time Object Detection Design | Nov 23, 2022 | CPUNeural Architecture Search | CodeCode Available | 4 |
| ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers | May 24, 2023 | Image Matting | CodeCode Available | 4 |
| DeepInverse: A Python package for solving imaging inverse problems with deep learning | May 26, 2025 | Image Reconstruction | CodeCode Available | 4 |
| Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution | May 26, 2025 | | CodeCode Available | 4 |
| Holistic Evaluation of Language Models | Nov 16, 2022 | FairnessQuestion Answering | CodeCode Available | 4 |
| Seed-Coder: Let the Code Model Curate Data for Itself | Jun 4, 2025 | Code CompletionCode Generation | CodeCode Available | 4 |
| FullStack Bench: Evaluating LLMs as Full Stack Coders | Nov 30, 2024 | | CodeCode Available | 4 |
| Motion Capture Dataset for Practical Use of AI-based Motion Editing and Stylization | Jun 15, 2023 | Motion Style TransferStyle Transfer | CodeCode Available | 4 |
| The Platonic Representation Hypothesis | May 13, 2024 | | CodeCode Available | 4 |
| MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Oct 9, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |
| ArchiSound: Audio Generation with Diffusion | Jan 30, 2023 | Audio GenerationGPU | CodeCode Available | 4 |
| Aligning benchmark datasets for table structure recognition | Mar 1, 2023 | Table DetectionTable Recognition | CodeCode Available | 4 |