| ForesightNav: Learning Scene Imagination for Efficient Exploration | Apr 22, 2025 | Efficient ExplorationNavigate | CodeCode Available | 2 |
| Text-based Animatable 3D Avatars with Morphable Model Alignment | Apr 22, 2025 | 3D Generation3DGS | CodeCode Available | 2 |
| WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks | Apr 22, 2025 | Benchmarking | CodeCode Available | 2 |
| WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents | Apr 22, 2025 | Knowledge GraphsMinecraft | CodeCode Available | 2 |
| DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding | Apr 21, 2025 | Hallucination | CodeCode Available | 2 |
| MARFT: Multi-Agent Reinforcement Fine-Tuning | Apr 21, 2025 | | CodeCode Available | 2 |
| Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Apr 21, 2025 | AttributeCamera Pose Estimation | CodeCode Available | 2 |
| Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Apr 21, 2025 | AllForm | CodeCode Available | 2 |
| FlowReasoner: Reinforcing Query-Level Meta-Agents | Apr 21, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| Learning Adaptive Parallel Reasoning with Language Models | Apr 21, 2025 | 4k | CodeCode Available | 2 |
| Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey | Apr 21, 2025 | Computational EfficiencyInformation Retrieval | CodeCode Available | 2 |
| Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation | Apr 21, 2025 | 6D Pose EstimationPose Estimation | CodeCode Available | 2 |
| Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction | Apr 21, 2025 | Math | CodeCode Available | 2 |
| Seurat: From Moving Points to Depth | Apr 20, 2025 | Depth EstimationPoint Tracking | CodeCode Available | 2 |
| Generative Auto-Bidding with Value-Guided Explorations | Apr 20, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| NTIRE 2025 Challenge on Image Super-Resolution (4): Methods and Results | Apr 20, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning | Apr 20, 2025 | AttributeFace Swapping | CodeCode Available | 2 |
| SG-Reg: Generalizable and Efficient Scene Graph Registration | Apr 20, 2025 | GPU | CodeCode Available | 2 |
| SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation | Apr 19, 2025 | ERPVideo Generation | CodeCode Available | 2 |
| InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners | Apr 19, 2025 | Action GenerationLogical Reasoning | CodeCode Available | 2 |
| CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey | Apr 19, 2025 | Computational EfficiencyDomain Adaptation | CodeCode Available | 2 |
| Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale | Apr 19, 2025 | Benchmarking | CodeCode Available | 2 |
| LangCoop: Collaborative Driving with Language | Apr 18, 2025 | Autonomous Driving | CodeCode Available | 2 |
| EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model | Apr 18, 2025 | Diagnostic | CodeCode Available | 2 |
| NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation | Apr 17, 2025 | Data AugmentationDiversity | CodeCode Available | 2 |
| Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation | Apr 17, 2025 | GPUObject Recognition | CodeCode Available | 2 |
| GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks | Apr 17, 2025 | | CodeCode Available | 2 |
| Digital Twin Generation from Visual Data: A Survey | Apr 17, 2025 | Semantic SegmentationSurvey | CodeCode Available | 2 |
| An All-Atom Generative Model for Designing Protein Complexes | Apr 17, 2025 | All | CodeCode Available | 2 |
| Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning | Apr 17, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials | Apr 17, 2025 | Articles | CodeCode Available | 2 |
| NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results | Apr 17, 2025 | Raindrop RemovalRain Removal | CodeCode Available | 2 |
| Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling | Apr 17, 2025 | Hallucination | CodeCode Available | 2 |
| Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs | Apr 17, 2025 | Position | CodeCode Available | 2 |
| Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off | Apr 17, 2025 | Garment ReconstructionImage Generation | CodeCode Available | 2 |
| Sleep-time Compute: Beyond Inference Scaling at Test-time | Apr 17, 2025 | | CodeCode Available | 2 |
| Representation Learning for Tabular Data: A Comprehensive Survey | Apr 17, 2025 | Representation LearningSurvey | CodeCode Available | 2 |
| MobilePoser: Real-Time Full-Body Pose Estimation and 3D Human Translation from IMUs in Mobile Consumer Devices | Apr 16, 2025 | Pose EstimationTranslation | CodeCode Available | 2 |
| Logits DeConfusion with CLIP for Few-Shot Learning | Apr 16, 2025 | Few-Shot Learning | CodeCode Available | 2 |
| Autoregressive Distillation of Diffusion Transformers | Apr 15, 2025 | | CodeCode Available | 2 |
| Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models | Apr 15, 2025 | Autonomous DrivingComputational Efficiency | CodeCode Available | 2 |
| TransST: Transfer Learning Embedded Spatial Factor Modeling of Spatial Transcriptomics Data | Apr 15, 2025 | Transfer Learning | CodeCode Available | 2 |
| Distillation-Supervised Convolutional Low-Rank Adaptation for Efficient Image Super-Resolution | Apr 15, 2025 | Image Super-ResolutionKnowledge Distillation | CodeCode Available | 2 |
| Multi-scale convolutional transformer network for motor imagery brain-computer interface | Apr 15, 2025 | 4-task ClassificationBrain Computer Interface | CodeCode Available | 2 |
| HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation | Apr 15, 2025 | Benchmarkingscientific discovery | CodeCode Available | 2 |
| 3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians | Apr 15, 2025 | 3DGSAffordance Recognition | CodeCode Available | 2 |
| An Efficient and Mixed Heterogeneous Model for Image Restoration | Apr 15, 2025 | Image RestorationMamba | CodeCode Available | 2 |
| MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning | Apr 14, 2025 | Machine TranslationReinforcement Learning (RL) | CodeCode Available | 2 |
| A Survey of Personalization: From RAG to Agent | Apr 14, 2025 | RAGRetrieval | CodeCode Available | 2 |
| FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding | Apr 14, 2025 | | CodeCode Available | 2 |