| CGVQM+D: Computer Graphics Video Quality Metric and Dataset | Jun 13, 2025 | DenoisingNovel View Synthesis | CodeCode Available | 2 |
| Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders | Jun 13, 2025 | Speech Enhancement | CodeCode Available | 2 |
| SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes | Jun 13, 2025 | Linear evaluationSelf-Supervised Learning | CodeCode Available | 2 |
| Execution Guided Line-by-Line Code Generation | Jun 12, 2025 | Code Generation | CodeCode Available | 2 |
| SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis | Jun 12, 2025 | BenchmarkingDialogue Generation | CodeCode Available | 2 |
| ConTextTab: A Semantics-Aware Tabular In-Context Learner | Jun 12, 2025 | In-Context LearningWorld Knowledge | CodeCode Available | 2 |
| AutoMind: Adaptive Knowledgeable Agent for Automated Data Science | Jun 12, 2025 | Code GenerationLarge Language Model | CodeCode Available | 2 |
| Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs | Jun 12, 2025 | PhilosophyPrompt Engineering | CodeCode Available | 2 |
| CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation | Jun 12, 2025 | | CodeCode Available | 2 |
| VideoDeepResearch: Long Video Understanding With Agentic Tool Using | Jun 12, 2025 | MMEVideo MME | CodeCode Available | 2 |
| Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs | Jun 12, 2025 | Diversity | CodeCode Available | 2 |
| TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning | Jun 12, 2025 | Answer GenerationChunking | CodeCode Available | 2 |
| QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction | Jun 12, 2025 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 2 |
| SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks | Jun 12, 2025 | GitHub issue resolutionvalid | CodeCode Available | 2 |
| OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems | Jun 12, 2025 | | CodeCode Available | 2 |
| ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark | Jun 12, 2025 | | CodeCode Available | 2 |
| GLAP: General contrastive audio-text pretraining across domains and languages | Jun 12, 2025 | AudioCapsKeyword Spotting | CodeCode Available | 2 |
| SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending | Jun 11, 2025 | Hierarchical Reinforcement LearningHumanoid Control | CodeCode Available | 2 |
| A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy | Jun 11, 2025 | | CodeCode Available | 2 |
| ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning | Jun 11, 2025 | Medical Question AnsweringQuestion Answering | CodeCode Available | 2 |
| TaskCraft: Automated Generation of Agentic Tasks | Jun 11, 2025 | | CodeCode Available | 2 |
| ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model | Jun 11, 2025 | cross-modal alignmentDescriptive | CodeCode Available | 2 |
| Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Jun 11, 2025 | Multimodal ReasoningSpatial Reasoning | CodeCode Available | 2 |
| Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression | Jun 11, 2025 | Image Generation | CodeCode Available | 2 |
| VerIF: Verification Engineering for Reinforcement Learning in Instruction Following | Jun 11, 2025 | Instruction Followingreinforcement-learning | CodeCode Available | 2 |
| IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments | Jun 11, 2025 | Benchmarking | CodeCode Available | 2 |
| Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information | Jun 11, 2025 | | CodeCode Available | 2 |
| Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20^th century Urban Landscapes with Satellite Imageries | Jun 11, 2025 | SegmentationSelf-Supervised Learning | CodeCode Available | 2 |
| UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting | Jun 11, 2025 | DiversityRepresentation Learning | CodeCode Available | 2 |
| CoRT: Code-integrated Reasoning within Thinking | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning | Jun 11, 2025 | Image CaptioningMath | CodeCode Available | 2 |
| CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models | Jun 11, 2025 | counterfactualDescriptive | CodeCode Available | 2 |
| Do MIL Models Transfer? | Jun 10, 2025 | Multiple Instance LearningTransfer Learning | CodeCode Available | 2 |
| Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability | Jun 10, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning | Jun 10, 2025 | Model SelectionReinforcement Learning (RL) | CodeCode Available | 2 |
| Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning Environment | Jun 10, 2025 | Combinatorial OptimizationImitation Learning | CodeCode Available | 2 |
| AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions | Jun 10, 2025 | Math | CodeCode Available | 2 |
| Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better | Jun 10, 2025 | Image Generation | CodeCode Available | 2 |
| StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams | Jun 10, 2025 | 3DGS3D Reconstruction | CodeCode Available | 2 |
| FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Jun 10, 2025 | Image-text RetrievalQuestion Answering | CodeCode Available | 2 |
| Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation | Jun 10, 2025 | FoveationImage Segmentation | CodeCode Available | 2 |
| SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | Jun 10, 2025 | 4kGPU | CodeCode Available | 2 |
| ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering | Jun 10, 2025 | Scheduling | CodeCode Available | 2 |
| FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems | Jun 10, 2025 | RAGRetrieval | CodeCode Available | 2 |
| Snap-and-tune: combining deep learning and test-time optimization for high-fidelity cardiovascular volumetric meshing | Jun 9, 2025 | | CodeCode Available | 2 |
| Open World Scene Graph Generation using Vision Language Models | Jun 9, 2025 | Graph GenerationScene Graph Generation | CodeCode Available | 2 |
| CausalPFN: Amortized Causal Effect Estimation via In-Context Learning | Jun 9, 2025 | Decision MakingHeterogeneous Treatment Effect Estimation | CodeCode Available | 2 |
| Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions | Jun 9, 2025 | Large Language ModelReinforcement Learning (RL) | CodeCode Available | 2 |
| FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling | Jun 9, 2025 | Density Estimation | CodeCode Available | 2 |
| Play to Generalize: Learning to Reason Through Game Play | Jun 9, 2025 | Domain GeneralizationMath | CodeCode Available | 2 |