| SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models | Jun 1, 2025 | | CodeCode Available | 3 |
| EXP-Bench: Can AI Conduct AI Research Experiments? | May 30, 2025 | | CodeCode Available | 3 |
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning | May 29, 2025 | In-Context LearningState Space Models | CodeCode Available | 3 |
| BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | May 29, 2025 | Large Language Modelscientific discovery | CodeCode Available | 3 |
| MAGREF: Masked Guidance for Any-Reference Video Generation | May 29, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 3 |
| EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge | May 29, 2025 | text-to-speechText to Speech | CodeCode Available | 3 |
| KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction | May 29, 2025 | Question Answering | CodeCode Available | 3 |
| Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models | May 29, 2025 | Autonomous DrivingDiagnostic | CodeCode Available | 3 |
| VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning | May 28, 2025 | RAG | CodeCode Available | 3 |
| NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation | May 27, 2025 | Computational EfficiencyGraph Neural Network | CodeCode Available | 3 |
| syftr: Pareto-Optimal Generative AI | May 26, 2025 | Bayesian OptimizationRAG | CodeCode Available | 3 |
| Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers | May 26, 2025 | Information Retrieval | CodeCode Available | 3 |
| Learning to Reason without External Rewards | May 26, 2025 | Code Generationreinforcement-learning | CodeCode Available | 3 |
| PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints | May 26, 2025 | Deep Learning | CodeCode Available | 3 |
| VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation | May 26, 2025 | DecoderLanguage Modeling | CodeCode Available | 3 |
| VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction | May 26, 2025 | 3D ReconstructionSpatial Reasoning | CodeCode Available | 3 |
| FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields | May 26, 2025 | Contrastive Learning | CodeCode Available | 3 |
| SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline | May 25, 2025 | Speech ExtractionSpeech Separation | CodeCode Available | 3 |
| InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts | May 25, 2025 | Chart UnderstandingQuestion Answering | CodeCode Available | 3 |
| OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data | May 24, 2025 | Image Stylization | CodeCode Available | 3 |
| VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning | May 24, 2025 | GPUReinforcement Learning (RL) | CodeCode Available | 3 |
| ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation | May 24, 2025 | BenchmarkingChart Understanding | CodeCode Available | 3 |
| Distilling LLM Agent into Small Models with Retrieval and Code Tools | May 23, 2025 | Action GenerationDomain Generalization | CodeCode Available | 3 |
| OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics | May 23, 2025 | Chart Understandingobject-detection | CodeCode Available | 3 |