| OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation | Apr 14, 2025 | 3D Shape Generation | CodeCode Available | 2 |
| FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding | Apr 14, 2025 | | CodeCode Available | 2 |
| Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models | Apr 14, 2025 | Action GenerationDenoising | CodeCode Available | 2 |
| NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results | Apr 14, 2025 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning | Apr 14, 2025 | Machine TranslationReinforcement Learning (RL) | CodeCode Available | 2 |
| SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users | Apr 14, 2025 | DiversityFace Alignment | CodeCode Available | 2 |
| How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients | Apr 14, 2025 | Instruction Following | CodeCode Available | 2 |
| LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification | Apr 14, 2025 | Person Re-Identification | CodeCode Available | 2 |
| FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand Generation | Apr 14, 2025 | | CodeCode Available | 2 |
| Software package for simulations using the coarse-grained CALVADOS model | Apr 14, 2025 | | CodeCode Available | 2 |
| The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Apr 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning | Apr 14, 2025 | Mathematical Reasoningmbpp | CodeCode Available | 2 |
| NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation | Apr 14, 2025 | Visual Navigation | CodeCode Available | 2 |
| HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation | Apr 13, 2025 | Multimodal ReasoningRAG | CodeCode Available | 2 |
| SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model | Apr 13, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation | Apr 13, 2025 | Domain AdaptationLanguage Modeling | CodeCode Available | 2 |
| Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images | Apr 13, 2025 | GPU | CodeCode Available | 2 |
| Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability | Apr 13, 2025 | model | CodeCode Available | 2 |
| TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning | Apr 13, 2025 | Question Answeringreinforcement-learning | CodeCode Available | 2 |
| ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model | Apr 13, 2025 | DiagnosticLanguage Modeling | CodeCode Available | 2 |
| Flux Already Knows -- Activating Subject-Driven Image Generation without Training | Apr 12, 2025 | Image GenerationVirtual Try-on | CodeCode Available | 2 |
| A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future | Apr 12, 2025 | | CodeCode Available | 2 |
| Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking | Apr 12, 2025 | Knowledge Distillation | CodeCode Available | 2 |
| BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting | Apr 12, 2025 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| DataMap: A Portable Application for Visualizing High-Dimensional Data | Apr 11, 2025 | | CodeCode Available | 2 |