| Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling | Jan 29, 2025 | Image Generation | CodeCode Available | 11 |
| Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation | Jan 21, 2025 | Texture Synthesis | CodeCode Available | 11 |
| WebWalker: Benchmarking LLMs in Web Traversal | Jan 13, 2025 | BenchmarkingOpen-Domain Question Answering | CodeCode Available | 11 |
| Eliza: A Web3 friendly AI Agent Operating System | Jan 12, 2025 | AI AgentRAG | CodeCode Available | 11 |
| WebLLM: A High-Performance In-Browser LLM Inference Engine | Dec 20, 2024 | CPUGPU | CodeCode Available | 11 |
| ROMAS: A Role-Based Multi-Agent System for Database monitoring and Planning | Dec 18, 2024 | | CodeCode Available | 11 |
| CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | Dec 13, 2024 | In-Context LearningQuantization | CodeCode Available | 11 |
| HunyuanVideo: A Systematic Framework For Large Video Generative Models | Dec 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 11 |
| Structured 3D Latents for Scalable and Versatile 3D Generation | Dec 2, 2024 | 3D Generation | CodeCode Available | 11 |
| Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis | Nov 29, 2024 | DisentanglementMotion Generation | CodeCode Available | 11 |
| Open-Sora Plan: Open-Source Large Video Generation Model | Nov 28, 2024 | Video Generation | CodeCode Available | 11 |
| WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model | Nov 26, 2024 | | CodeCode Available | 11 |
| JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Nov 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| EAP4EMSIG -- Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cells Analysis | Nov 6, 2024 | | CodeCode Available | 11 |
| Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation | Oct 17, 2024 | Visual Question Answering | CodeCode Available | 11 |
| Agent S: An Open Agentic Framework that Uses Computers Like a Human | Oct 10, 2024 | AI AgentTask Planning | CodeCode Available | 11 |
| F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching | Oct 9, 2024 | Denoisingtext-to-speech | CodeCode Available | 11 |
| Pixtral 12B | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| HybridFlow: A Flexible and Efficient RLHF Framework | Sep 28, 2024 | Large Language Model | CodeCode Available | 11 |
| Qwen2.5-Coder Technical Report | Sep 18, 2024 | Code Generation | CodeCode Available | 11 |
| Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | Sep 18, 2024 | Natural Language Visual Grounding | CodeCode Available | 11 |
| Magika: AI-Powered Content-Type Detection | Sep 18, 2024 | CPUMalware Analysis | CodeCode Available | 11 |
| Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way | Aug 28, 2024 | Code GenerationNavigate | CodeCode Available | 11 |
| KAN 2.0: Kolmogorov-Arnold Networks Meet Science | Aug 19, 2024 | Kolmogorov-Arnold Networksscientific discovery | CodeCode Available | 11 |
| Introduction to Reinforcement Learning | Aug 13, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 11 |
| CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Aug 12, 2024 | Text-to-Video GenerationVideo Alignment | CodeCode Available | 11 |
| The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Aug 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Aug 10, 2024 | HallucinationOptical Character Recognition | CodeCode Available | 11 |
| SAM 2: Segment Anything in Images and Videos | Aug 1, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 11 |
| Very Large-Scale Multi-Agent Simulation in AgentScope | Jul 25, 2024 | | CodeCode Available | 11 |
| Gymnasium: A Standard Interface for Reinforcement Learning Environments | Jul 24, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 11 |
| Deep Time Series Models: A Comprehensive Survey and Benchmark | Jul 18, 2024 | SurveyTime Series | CodeCode Available | 11 |
| FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | Jul 11, 2024 | GPUQuantization | CodeCode Available | 11 |
| CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens | Jul 7, 2024 | Language ModellingLarge Language Model | CodeCode Available | 11 |
| FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs | Jul 4, 2024 | Emotion RecognitionEvent Detection | CodeCode Available | 11 |
| LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jul 3, 2024 | Computational EfficiencyFace Reenactment | CodeCode Available | 11 |
| Scaling Synthetic Data Creation with 1,000,000,000 Personas | Jun 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | Jun 8, 2024 | Task PlanningVulnerability Detection | CodeCode Available | 11 |
| Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality | May 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 |
| RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness | May 27, 2024 | HallucinationImage Captioning | CodeCode Available | 11 |
| Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation | May 24, 2024 | | CodeCode Available | 11 |
| YOLOv10: Real-Time End-to-End Object Detection | May 23, 2024 | 2D Object DetectionData Augmentation | CodeCode Available | 11 |
| USP: A Unified Sequence Parallelism Approach for Long Context Generative AI | May 13, 2024 | | CodeCode Available | 11 |
| SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | May 6, 2024 | Bug fixingLanguage Modeling | CodeCode Available | 11 |
| KAN: Kolmogorov-Arnold Networks | Apr 30, 2024 | Kolmogorov-Arnold Networks | CodeCode Available | 11 |
| Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models | Apr 16, 2024 | Data InteractionText to SQL | CodeCode Available | 11 |
| Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence | Apr 8, 2024 | | CodeCode Available | 11 |
| AutoDev: Automated AI-Driven Development | Mar 13, 2024 | Code GenerationHumanEval | CodeCode Available | 11 |
| LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language | Feb 26, 2024 | Prompt Engineering | CodeCode Available | 11 |
| AgentScope: A Flexible yet Robust Multi-Agent Platform | Feb 21, 2024 | Multi-agent Integration | CodeCode Available | 11 |