| Towards High-Resolution 3D Anomaly Detection: A Scalable Dataset and Real-Time Framework for Subtle Industrial Defects | Jul 10, 2025 | 3D Anomaly DetectionAnomaly Detection | CodeCode Available | 2 |
| MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization | Jul 10, 2025 | 2kQuantization | CodeCode Available | 2 |
| Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery | Jul 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Jul 9, 2025 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 |
| AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs | Jul 8, 2025 | GPUreinforcement-learning | CodeCode Available | 2 |
| Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion | Jul 8, 2025 | 3D geometryDomain Generalization | CodeCode Available | 2 |
| Omni-Video: Democratizing Unified Video Understanding and Generation | Jul 8, 2025 | Video GenerationVideo Understanding | CodeCode Available | 2 |
| High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning | Jul 8, 2025 | MMEReinforcement Learning (RL) | CodeCode Available | 2 |
| Modern Methods in Associative Memory | Jul 8, 2025 | | CodeCode Available | 2 |
| Differentiable Reward Optimization for LLM based TTS system | Jul 8, 2025 | text-to-speechText to Speech | CodeCode Available | 2 |
| GTA1: GUI Test-time Scaling Agent | Jul 8, 2025 | Reinforcement Learning (RL)Task Planning | CodeCode Available | 2 |
| T-LoRA: Single Image Diffusion Model Customization Without Overfitting | Jul 8, 2025 | | CodeCode Available | 2 |
| RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction | Jul 7, 2025 | | CodeCode Available | 2 |
| Neural-Driven Image Editing | Jul 7, 2025 | Contrastive LearningMultimodel-guided image editing | CodeCode Available | 2 |
| any4: Learned 4-bit Numeric Representation for LLMs | Jul 7, 2025 | GPUGSM8K | CodeCode Available | 2 |
| Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration | Jul 7, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts | Jul 7, 2025 | Inductive BiasMixture-of-Experts | CodeCode Available | 2 |
| BackFed: An Efficient & Standardized Benchmark Suite for Backdoor Attacks in Federated Learning | Jul 7, 2025 | Federated Learning | CodeCode Available | 2 |
| MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection | Jul 6, 2025 | 3D Object DetectionAttribute | CodeCode Available | 2 |
| PresentAgent: Multimodal Agent for Presentation Video Generation | Jul 5, 2025 | text-to-speechText to Speech | CodeCode Available | 2 |
| GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning | Jul 4, 2025 | BenchmarkingGraph Generation | CodeCode Available | 2 |
| Flow-Anchored Consistency Models | Jul 4, 2025 | Image Generation | CodeCode Available | 2 |
| Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks | Jul 3, 2025 | Instruction Following | CodeCode Available | 2 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 |
| SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment | Jul 3, 2025 | 3D ReconstructionScene Understanding | CodeCode Available | 2 |
| AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench | Jul 3, 2025 | Navigate | CodeCode Available | 2 |
| Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation | Jul 3, 2025 | DiversityVideo Generation | CodeCode Available | 2 |
| MathOptAI.jl: Embed trained machine learning predictors into JuMP models | Jul 3, 2025 | CPUGaussian Processes | CodeCode Available | 2 |
| MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement | Jul 1, 2025 | Automatic Speech RecognitionMamba | CodeCode Available | 2 |
| NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments | Jun 30, 2025 | Decision MakingVision and Language Navigation | CodeCode Available | 2 |
| Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning | Jun 30, 2025 | Imitation LearningTrajectory Planning | CodeCode Available | 2 |
| DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World | Jun 30, 2025 | Caption GenerationObject | CodeCode Available | 2 |
| SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | Jun 30, 2025 | MathMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions | Jun 29, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation | Jun 29, 2025 | GPUOptical Flow Estimation | CodeCode Available | 2 |
| R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning | Jun 27, 2025 | Object TrackingTemplate Matching | CodeCode Available | 2 |
| EAMamba: Efficient All-Around Vision State Space Model for Image Restoration | Jun 27, 2025 | AllDeblurring | CodeCode Available | 2 |
| The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements | Jun 27, 2025 | | CodeCode Available | 2 |
| LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs | Jun 27, 2025 | Question AnsweringVideo Question Answering | CodeCode Available | 2 |
| Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning | Jun 27, 2025 | Foreground Segmentationobject-detection | CodeCode Available | 2 |
| WAFT: Warping-Alone Field Transforms for Optical Flow | Jun 26, 2025 | Optical Flow EstimationZero-shot Generalization | CodeCode Available | 2 |
| ESMStereo: Enhanced ShuffleMixer Disparity Upsampling for Real-Time and Accurate Stereo Matching | Jun 26, 2025 | Disparity EstimationStereo Matching | CodeCode Available | 2 |
| Spatial Mental Modeling from Limited Views | Jun 26, 2025 | | CodeCode Available | 2 |
| EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora | Jun 26, 2025 | Graph ReconstructionRAG | CodeCode Available | 2 |
| Learning to See in the Extremely Dark | Jun 26, 2025 | DenoisingExposure Correction | CodeCode Available | 2 |
| BMFM-DNA: A SNP-aware DNA foundation model to capture variant effects | Jun 26, 2025 | ImputationPromoter Detection | CodeCode Available | 2 |
| KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model | Jun 26, 2025 | Representation LearningRetrieval | CodeCode Available | 2 |
| DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding | Jun 26, 2025 | EEGEeg Decoding | CodeCode Available | 2 |
| HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Jun 26, 2025 | Large Language ModelMultimodal Reasoning | CodeCode Available | 2 |
| Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends | Jun 26, 2025 | Action GenerationVision-Language-Action | CodeCode Available | 2 |