| SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models | Jun 1, 2025 | | CodeCode Available | 3 |
| EXP-Bench: Can AI Conduct AI Research Experiments? | May 30, 2025 | | CodeCode Available | 3 |
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | May 29, 2025 | Large Language Modelscientific discovery | CodeCode Available | 3 |
| MAGREF: Masked Guidance for Any-Reference Video Generation | May 29, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 3 |
| EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge | May 29, 2025 | text-to-speechText to Speech | CodeCode Available | 3 |
| KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction | May 29, 2025 | Question Answering | CodeCode Available | 3 |
| Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models | May 29, 2025 | Autonomous DrivingDiagnostic | CodeCode Available | 3 |
| TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning | May 29, 2025 | In-Context LearningState Space Models | CodeCode Available | 3 |
| VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning | May 28, 2025 | RAG | CodeCode Available | 3 |
| NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation | May 27, 2025 | Computational EfficiencyGraph Neural Network | CodeCode Available | 3 |
| Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers | May 26, 2025 | Information Retrieval | CodeCode Available | 3 |
| Learning to Reason without External Rewards | May 26, 2025 | Code Generationreinforcement-learning | CodeCode Available | 3 |
| PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints | May 26, 2025 | Deep Learning | CodeCode Available | 3 |
| syftr: Pareto-Optimal Generative AI | May 26, 2025 | Bayesian OptimizationRAG | CodeCode Available | 3 |
| VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation | May 26, 2025 | DecoderLanguage Modeling | CodeCode Available | 3 |
| VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction | May 26, 2025 | 3D ReconstructionSpatial Reasoning | CodeCode Available | 3 |
| FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields | May 26, 2025 | Contrastive Learning | CodeCode Available | 3 |
| SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline | May 25, 2025 | Speech ExtractionSpeech Separation | CodeCode Available | 3 |
| InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts | May 25, 2025 | Chart UnderstandingQuestion Answering | CodeCode Available | 3 |
| OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data | May 24, 2025 | Image Stylization | CodeCode Available | 3 |
| ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation | May 24, 2025 | BenchmarkingChart Understanding | CodeCode Available | 3 |
| VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning | May 24, 2025 | GPUReinforcement Learning (RL) | CodeCode Available | 3 |
| RemoteSAM: Towards Segment Anything for Earth Observation | May 23, 2025 | AttributeEarth Observation | CodeCode Available | 3 |
| Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality | May 23, 2025 | In-Context LearningToken Reduction | CodeCode Available | 3 |
| CLIMB: Class-imbalanced Learning Benchmark on Tabular Data | May 23, 2025 | | CodeCode Available | 3 |
| Distilling LLM Agent into Small Models with Retrieval and Code Tools | May 23, 2025 | Action GenerationDomain Generalization | CodeCode Available | 3 |
| OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics | May 23, 2025 | Chart Understandingobject-detection | CodeCode Available | 3 |
| Training-Free Efficient Video Generation via Dynamic Token Carving | May 22, 2025 | DenoisingVideo Generation | CodeCode Available | 3 |
| MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems | May 22, 2025 | | CodeCode Available | 3 |
| AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models | May 22, 2025 | BenchmarkingFairness | CodeCode Available | 3 |
| Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning | May 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO | May 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| LaViDa: A Large Diffusion Language Model for Multimodal Understanding | May 22, 2025 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL | May 22, 2025 | Natural Language UnderstandingReinforcement Learning (RL) | CodeCode Available | 3 |
| Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning | May 22, 2025 | | CodeCode Available | 3 |
| IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models | May 22, 2025 | BenchmarkingInstruction Following | CodeCode Available | 3 |
| Distance Adaptive Beam Search for Provably Accurate Graph-Based Nearest Neighbor Search | May 21, 2025 | Information Retrieval | CodeCode Available | 3 |
| Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space | May 21, 2025 | | CodeCode Available | 3 |
| MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem | May 20, 2025 | Mathematical Reasoningscientific discovery | CodeCode Available | 3 |
| Efficient Agent Training for Computer Use | May 20, 2025 | | CodeCode Available | 3 |
| OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking | May 20, 2025 | Benchmarking | CodeCode Available | 3 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 |
| RLVR-World: Training World Models with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| MLZero: A Multi-Agent System for End-to-end Machine Learning Automation | May 20, 2025 | AutoMLCode Generation | CodeCode Available | 3 |
| This Time is Different: An Observability Perspective on Time Series Foundation Models | May 20, 2025 | DecoderMultivariate Time Series Forecasting | CodeCode Available | 3 |
| From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery | May 19, 2025 | Navigatescientific discovery | CodeCode Available | 3 |
| Thinkless: LLM Learns When to Think | May 19, 2025 | GSM8KMath | CodeCode Available | 3 |
| ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning | May 19, 2025 | Machine Translationreinforcement-learning | CodeCode Available | 3 |
| Harnessing the Universal Geometry of Embeddings | May 18, 2025 | Attribute | CodeCode Available | 3 |