| KAN 2.0: Kolmogorov-Arnold Networks Meet Science | Aug 19, 2024 | Kolmogorov-Arnold Networksscientific discovery | CodeCode Available | 11 | 5 |
| Absolute Zero: Reinforced Self-play Reasoning with Zero Data | May 6, 2025 | Mathematical Reasoning | CodeCode Available | 11 | 5 |
| USP: A Unified Sequence Parallelism Approach for Long Context Generative AI | May 13, 2024 | | CodeCode Available | 11 | 5 |
| Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence | Apr 8, 2024 | | CodeCode Available | 11 | 5 |
| InstantID: Zero-shot Identity-Preserving Generation in Seconds | Jan 15, 2024 | Diffusion PersonalizationDiffusion Personalization Tuning Free | CodeCode Available | 11 | 5 |
| CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training | May 23, 2025 | Automatic Speech RecognitionEmotion Recognition | CodeCode Available | 11 | 5 |
| CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Aug 12, 2024 | Text-to-Video GenerationVideo Alignment | CodeCode Available | 11 | 5 |
| Eliza: A Web3 friendly AI Agent Operating System | Jan 12, 2025 | AI AgentRAG | CodeCode Available | 11 | 5 |
| Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation | Oct 17, 2024 | Visual Question Answering | CodeCode Available | 11 | 5 |
| SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Aug 10, 2024 | HallucinationOptical Character Recognition | CodeCode Available | 11 | 5 |
| LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language | Feb 26, 2024 | Prompt Engineering | CodeCode Available | 11 | 5 |
| Pixtral 12B | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 | 5 |
| Structured 3D Latents for Scalable and Versatile 3D Generation | Dec 2, 2024 | 3D Generation | CodeCode Available | 11 | 5 |
| RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness | May 27, 2024 | HallucinationImage Captioning | CodeCode Available | 11 | 5 |
| Qwen2.5-VL Technical Report | Feb 19, 2025 | document understanding | CodeCode Available | 11 | 5 |
| WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model | Nov 26, 2024 | | CodeCode Available | 11 | 5 |
| Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models | Apr 16, 2024 | Data InteractionText to SQL | CodeCode Available | 11 | 5 |
| ROMAS: A Role-Based Multi-Agent System for Database monitoring and Planning | Dec 18, 2024 | | CodeCode Available | 11 | 5 |
| Agent S: An Open Agentic Framework that Uses Computers Like a Human | Oct 10, 2024 | AI AgentTask Planning | CodeCode Available | 11 | 5 |
| The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Aug 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 | 5 |
| WebLLM: A High-Performance In-Browser LLM Inference Engine | Dec 20, 2024 | CPUGPU | CodeCode Available | 11 | 5 |
| Deep Time Series Models: A Comprehensive Survey and Benchmark | Jul 18, 2024 | SurveyTime Series | CodeCode Available | 11 | 5 |
| Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling | Jan 29, 2025 | Image Generation | CodeCode Available | 11 | 5 |
| LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jul 3, 2024 | Computational EfficiencyFace Reenactment | CodeCode Available | 11 | 5 |
| OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation | May 29, 2025 | Large Language Model | CodeCode Available | 11 | 5 |
| Wan: Open and Advanced Large-Scale Video Generative Models | Mar 26, 2025 | Video EditingVideo Generation | CodeCode Available | 11 | 5 |
| Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation | Jan 21, 2025 | Texture Synthesis | CodeCode Available | 11 | 5 |
| SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models | Feb 28, 2025 | MMLU | CodeCode Available | 11 | 5 |
| Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models | Mar 5, 2025 | HallucinationInstruction Following | CodeCode Available | 11 | 5 |
| CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | Dec 13, 2024 | In-Context LearningQuantization | CodeCode Available | 11 | 5 |
| FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs | Jul 4, 2024 | Emotion RecognitionEvent Detection | CodeCode Available | 11 | 5 |
| Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way | Aug 28, 2024 | Code GenerationNavigate | CodeCode Available | 11 | 5 |
| WebDancer: Towards Autonomous Information Seeking Agency | May 28, 2025 | | CodeCode Available | 11 | 5 |
| YOLOE: Real-Time Seeing Anything | Mar 10, 2025 | 10-shot image generation | CodeCode Available | 11 | 5 |
| VGGT: Visual Geometry Grounded Transformer | Mar 14, 2025 | Depth EstimationNovel View Synthesis | CodeCode Available | 11 | 5 |
| Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation | May 24, 2024 | | CodeCode Available | 11 | 5 |
| Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality | May 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 | 5 |
| Packing Input Frame Context in Next-Frame Prediction Models for Video Generation | Apr 17, 2025 | | CodeCode Available | 11 | 5 |
| Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens | Mar 3, 2025 | Attributetext-to-speech | CodeCode Available | 11 | 5 |
| olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models | Feb 25, 2025 | DiversityLanguage Modeling | CodeCode Available | 11 | 5 |
| Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents | Apr 1, 2025 | AI AgentTask Planning | CodeCode Available | 11 | 5 |
| TinyLlama: An Open-Source Small Language Model | Jan 4, 2024 | Computational EfficiencyLanguage Modeling | CodeCode Available | 11 | 5 |
| Open-Sora Plan: Open-Source Large Video Generation Model | Nov 28, 2024 | Video Generation | CodeCode Available | 11 | 5 |
| YOLOv10: Real-Time End-to-End Object Detection | May 23, 2024 | 2D Object DetectionData Augmentation | CodeCode Available | 11 | 5 |
| Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis | Nov 29, 2024 | DisentanglementMotion Generation | CodeCode Available | 11 | 5 |
| Very Large-Scale Multi-Agent Simulation in AgentScope | Jul 25, 2024 | | CodeCode Available | 11 | 5 |
| CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens | Jul 7, 2024 | Language ModellingLarge Language Model | CodeCode Available | 11 | 5 |
| JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Nov 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 11 | 5 |
| WebSailor: Navigating Super-human Reasoning for Web Agent | Jul 3, 2025 | | CodeCode Available | 11 | 5 |
| Magika: AI-Powered Content-Type Detection | Sep 18, 2024 | CPUMalware Analysis | CodeCode Available | 11 | 5 |