| Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | Apr 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction | Apr 8, 2025 | 3D ReconstructionDepth Estimation | CodeCode Available | 2 |
| PyTopo3D: A Python Framework for 3D SIMP-based Topology Optimization | Apr 8, 2025 | | CodeCode Available | 2 |
| HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance | Apr 8, 2025 | Image Generation | CodeCode Available | 2 |
| Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Apr 8, 2025 | | CodeCode Available | 2 |
| Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation | Apr 8, 2025 | Domain AdaptationDomain Generalization | CodeCode Available | 2 |
| Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing | Apr 8, 2025 | DeepFake DetectionDimensionality Reduction | CodeCode Available | 2 |
| Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset | Apr 8, 2025 | Activity RecognitionHuman Activity Recognition | CodeCode Available | 2 |
| Holistic Fusion: Task- and Setup-Agnostic Robot Localization and State Estimation with Factor Graphs | Apr 8, 2025 | Motion EstimationSensor Fusion | CodeCode Available | 2 |
| HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Apr 8, 2025 | CPUGPU | CodeCode Available | 2 |
| InteractVLM: 3D Interaction Reasoning from 2D Foundational Models | Apr 7, 2025 | 3D ReconstructionObject | CodeCode Available | 2 |
| Machine learning interatomic potential can infer electrical response | Apr 7, 2025 | | CodeCode Available | 2 |
| Gaussian Mixture Flow Matching Models | Apr 7, 2025 | DenoisingImage Generation | CodeCode Available | 2 |
| SlicerNNInteractive: A 3D Slicer extension for nnInteractive | Apr 7, 2025 | Image SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors | Apr 7, 2025 | GPU | CodeCode Available | 2 |
| Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | Apr 7, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting | Apr 7, 2025 | Boundary DetectionObject | CodeCode Available | 2 |
| Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models | Apr 7, 2025 | Dialogue EvaluationFairness | CodeCode Available | 2 |
| Regional Tiny Stories: Using Small Models to Compare Language Learning and Tokenizer Performance | Apr 7, 2025 | | CodeCode Available | 2 |
| Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models | Apr 7, 2025 | MathQuantization | CodeCode Available | 2 |
| SEAL: Steerable Reasoning Calibration of Large Language Models for Free | Apr 7, 2025 | GSM8K | CodeCode Available | 2 |
| Content-Aware Transformer for All-in-one Image Restoration | Apr 7, 2025 | AllImage Restoration | CodeCode Available | 2 |
| One Quantizer is Enough: Toward a Lightweight Audio Codec | Apr 7, 2025 | | CodeCode Available | 2 |
| MedM-VL: What Makes a Good Medical LVLM? | Apr 6, 2025 | Medical Image AnalysisQuestion Answering | CodeCode Available | 2 |
| UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding | Apr 6, 2025 | Image Generation | CodeCode Available | 2 |
| Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection | Apr 6, 2025 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation | Apr 6, 2025 | Multi-Object TrackingObject | CodeCode Available | 2 |
| VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation | Apr 5, 2025 | | CodeCode Available | 2 |
| MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation | Apr 4, 2025 | Machine TranslationTranslation | CodeCode Available | 2 |
| Agentic Knowledgeable Self-awareness | Apr 4, 2025 | Decision Making | CodeCode Available | 2 |
| RWKVTTS: Yet another TTS based on RWKV-7 | Apr 4, 2025 | Computational Efficiencytext-to-speech | CodeCode Available | 2 |
| Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation | Apr 4, 2025 | Domain GeneralizationMamba | CodeCode Available | 2 |
| Investigating Affective Use and Emotional Well-being on ChatGPT | Apr 4, 2025 | Privacy Preserving | CodeCode Available | 2 |
| Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme | Apr 3, 2025 | Reinforcement Learning (RL)Visual Reasoning | CodeCode Available | 2 |
| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| Re-thinking Temporal Search for Long-Form Video Understanding | Apr 3, 2025 | Computational EfficiencyForm | CodeCode Available | 2 |
| Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models | Apr 3, 2025 | | CodeCode Available | 2 |
| Exploration-Driven Generative Interactive Environments | Apr 3, 2025 | | CodeCode Available | 2 |
| GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration | Apr 3, 2025 | GPUQuantization | CodeCode Available | 2 |
| GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning | Apr 3, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design | Apr 3, 2025 | Band GapDielectric Constant | CodeCode Available | 2 |
| Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation | Apr 3, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing | Apr 3, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 2 |
| ZClip: Adaptive Spike Mitigation for LLM Pre-Training | Apr 3, 2025 | Anomaly DetectionLarge Language Model | CodeCode Available | 2 |
| Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery | Apr 3, 2025 | Field Boundary DelineationInstance Segmentation | CodeCode Available | 2 |
| Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework | Apr 2, 2025 | BenchmarkingSynthetic Data Generation | CodeCode Available | 2 |
| Scene-Centric Unsupervised Panoptic Segmentation | Apr 2, 2025 | Instance SegmentationPanoptic Segmentation | CodeCode Available | 2 |
| MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits | Apr 2, 2025 | | CodeCode Available | 2 |
| AI-Newton: A Concept-Driven Physical Law Discovery System without Prior Physical Knowledge | Apr 2, 2025 | scientific discovery | CodeCode Available | 2 |
| SpaceR: Reinforcing MLLMs in Video Spatial Reasoning | Apr 2, 2025 | MMESpatial Reasoning | CodeCode Available | 2 |