| FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration | Jan 24, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 5 |
| TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools | Mar 14, 2025 | AI AgentDecision Making | CodeCode Available | 5 |
| Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | Mar 9, 2025 | MathMultimodal Reasoning | CodeCode Available | 5 |
| OS-Copilot: Towards Generalist Computer Agents with Self-Improvement | Feb 12, 2024 | | CodeCode Available | 5 |
| Time-series attribution maps with regularized contrastive learning | Feb 17, 2025 | Contrastive LearningTime Series | CodeCode Available | 5 |
| Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives | Jan 7, 2025 | Autonomous DrivingGeneral Knowledge | CodeCode Available | 5 |
| Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs | Jan 22, 2024 | Diffusion Personalization Tuning FreeImage Generation | CodeCode Available | 5 |
| GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Mar 5, 2025 | Novel View SynthesisVideo Generation | CodeCode Available | 5 |
| MobileSAMv2: Faster Segment Anything to Everything | Dec 15, 2023 | DecoderKnowledge Distillation | CodeCode Available | 5 |
| Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer | Dec 1, 2024 | Image AnimationPortrait Animation | CodeCode Available | 5 |
| BlackJAX: Composable Bayesian inference in JAX | Feb 16, 2024 | Bayesian InferenceProbabilistic Programming | CodeCode Available | 5 |
| CodeGen2: Lessons for Training LLMs on Programming and Natural Languages | May 3, 2023 | Causal Language ModelingDecoder | CodeCode Available | 5 |
| Multimodal Autoregressive Pre-training of Large Vision Encoders | Nov 21, 2024 | DecoderImage Classification | CodeCode Available | 5 |
| Active Learning for Neural PDE Solvers | Aug 2, 2024 | Active Learning | CodeCode Available | 5 |
| Cosmos World Foundation Model Platform for Physical AI | Jan 7, 2025 | modelPosition | CodeCode Available | 5 |
| PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery | Jun 16, 2024 | DecoderEarth Observation | CodeCode Available | 5 |
| Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | Apr 14, 2025 | Question Answering | CodeCode Available | 5 |
| LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks | Dec 19, 2024 | 8kIn-Context Learning | CodeCode Available | 5 |
| Information Flow Routes: Automatically Interpreting Language Models at Scale | Feb 27, 2024 | | CodeCode Available | 5 |
| Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation | Mar 12, 2024 | Image GenerationLanguage Modelling | CodeCode Available | 5 |
| UniDepth: Universal Monocular Metric Depth Estimation | Mar 27, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 5 |
| Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey | Aug 23, 2024 | Image SegmentationSegmentation | CodeCode Available | 5 |
| AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance | Jun 4, 2025 | BenchmarkingScheduling | CodeCode Available | 5 |
| DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ | May 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Noisereduce: Domain General Noise Reduction for Time Series Signals | Dec 19, 2024 | Time Series | CodeCode Available | 5 |
| Evaluating Real-World Robot Manipulation Policies in Simulation | May 9, 2024 | Robotic GraspingRobot Manipulation | CodeCode Available | 5 |
| LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model | Apr 28, 2023 | Instruction Followingmodel | CodeCode Available | 5 |
| Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments | Jan 10, 2023 | GPUImitation Learning | CodeCode Available | 5 |
| ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models | May 30, 2025 | Reinforcement Learning (RL) | CodeCode Available | 5 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 |
| Break the Sequential Dependency of LLM Inference Using Lookahead Decoding | Feb 3, 2024 | Code Completion | CodeCode Available | 5 |
| Allegro: Open the Black Box of Commercial-Level Video Generation Model | Oct 20, 2024 | Video Generation | CodeCode Available | 5 |
| Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Aug 22, 2024 | 10-shot image generation | CodeCode Available | 5 |
| VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild | Nov 27, 2022 | Video EditingVideo Generation | CodeCode Available | 5 |
| XFeat: Accelerated Features for Lightweight Image Matching | Apr 30, 2024 | CPUKeypoint detection and image matching | CodeCode Available | 5 |
| Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Nov 4, 2024 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 5 |
| ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment | Mar 8, 2024 | DenoisingImage Generation | CodeCode Available | 5 |
| ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Jun 6, 2024 | Video CaptioningVideo Generation | CodeCode Available | 5 |
| Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Jan 21, 2025 | Computational EfficiencyDepth Estimation | CodeCode Available | 5 |
| Fast Inference from Transformers via Speculative Decoding | Nov 30, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty | Nov 1, 2022 | | CodeCode Available | 5 |
| Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | Nov 28, 2023 | Electrical EngineeringExperimental Design | CodeCode Available | 5 |
| NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms | Feb 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| OmniRe: Omni Urban Scene Reconstruction | Aug 29, 2024 | 3DGS | CodeCode Available | 5 |
| CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion | Mar 8, 2024 | Computational EfficiencyImage Generation | CodeCode Available | 5 |
| QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models | Sep 26, 2023 | Quantization | CodeCode Available | 5 |
| GenCast: Diffusion-based ensemble forecasting for medium-range weather | Dec 25, 2023 | Decision MakingWeather Forecasting | CodeCode Available | 5 |
| Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes | Apr 16, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 5 |
| Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts | Oct 14, 2024 | Mixture-of-ExpertsTime Series | CodeCode Available | 5 |
| How to Design Translation Prompts for ChatGPT: An Empirical Study | Apr 5, 2023 | Machine TranslationNatural Language Understanding | CodeCode Available | 5 |