| LandMarkSystem Technical Report | Mar 27, 2025 | 3DGS3D Reconstruction | CodeCode Available | 2 |
| Datasets for Depression Modeling in Social Media: An Overview | Mar 27, 2025 | | CodeCode Available | 2 |
| UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning | Mar 27, 2025 | Model OptimizationReinforcement Learning (RL) | CodeCode Available | 2 |
| Harmonizing Visual Representations for Unified Multimodal Understanding and Generation | Mar 27, 2025 | Image GenerationQuantization | CodeCode Available | 2 |
| Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model | Mar 27, 2025 | EgoSchemaLanguage Modeling | CodeCode Available | 2 |
| Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data | Mar 27, 2025 | Text to 3D | CodeCode Available | 2 |
| MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search | Mar 26, 2025 | Decision MakingRAG | CodeCode Available | 2 |
| Progressive Focused Transformer for Single Image Super-Resolution | Mar 26, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | Mar 26, 2025 | Prompt EngineeringReinforcement Learning (RL) | CodeCode Available | 2 |
| SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity | Mar 26, 2025 | Test-time Adaptation | CodeCode Available | 2 |
| Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation | Mar 26, 2025 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Riemannian Optimization on Relaxed Indicator Matrix Manifold | Mar 26, 2025 | Denoisingglobal-optimization | CodeCode Available | 2 |
| Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector | Mar 26, 2025 | Binary ClassificationDeepFake Detection | CodeCode Available | 2 |
| Unified Multimodal Discrete Diffusion | Mar 26, 2025 | Image CaptioningImage Generation | CodeCode Available | 2 |
| Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection | Mar 25, 2025 | Anomaly DetectionUnsupervised Anomaly Detection | CodeCode Available | 2 |
| Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings | Mar 25, 2025 | 4kAction Recognition | CodeCode Available | 2 |
| Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis | Mar 25, 2025 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| Scaling Down Text Encoders of Text-to-Image Diffusion Models | Mar 25, 2025 | GPUImage Generation | CodeCode Available | 2 |
| SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining | Mar 25, 2025 | Autonomous DrivingComputational Efficiency | CodeCode Available | 2 |
| RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs | Mar 25, 2025 | Abstract generation | CodeCode Available | 2 |
| GENIUS: A Generative Framework for Universal Multimodal Search | Mar 25, 2025 | Information RetrievalQuantization | CodeCode Available | 2 |
| Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection | Mar 25, 2025 | DeepFake DetectionFace Swapping | CodeCode Available | 2 |
| UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design | Mar 25, 2025 | Drug DiscoveryLatent Diffusion Model for 3D | CodeCode Available | 2 |
| HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting | Mar 25, 2025 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting | Mar 25, 2025 | 3DGSObject | CodeCode Available | 2 |
| Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy | Mar 25, 2025 | DenoisingRobot Manipulation | CodeCode Available | 2 |
| Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing | Mar 25, 2025 | Image DehazingImage Generation | CodeCode Available | 2 |
| Cross-Tokenizer Distillation via Approximate Likelihood Matching | Mar 25, 2025 | Large Language Model | CodeCode Available | 2 |
| Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective | Mar 24, 2025 | Building Damage AssessmentChange Detection | CodeCode Available | 2 |
| Towards Training-free Anomaly Detection with Vision and Language Foundation Models | Mar 24, 2025 | Anomaly Detection | CodeCode Available | 2 |
| UniPCGC: Towards Practical Point Cloud Geometry Compression via an Efficient Unified Approach | Mar 24, 2025 | Data Compression | CodeCode Available | 2 |
| Reasoning to Learn from Latent Thoughts | Mar 24, 2025 | MathText Generation | CodeCode Available | 2 |
| BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache | Mar 24, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| LLaVAction: evaluating and training multi-modal large language models for action recognition | Mar 24, 2025 | Action RecognitionAction Understanding | CodeCode Available | 2 |
| MaSS13K: A Matting-level Semantic Segmentation Benchmark | Mar 24, 2025 | 4kImage Matting | CodeCode Available | 2 |
| I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders | Mar 24, 2025 | All | CodeCode Available | 2 |
| Hardware-Rasterized Ray-Based Gaussian Splatting | Mar 24, 2025 | Mixed RealityNovel View Synthesis | CodeCode Available | 2 |
| LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL | Mar 24, 2025 | RetrievalText to SQL | CodeCode Available | 2 |
| DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation | Mar 24, 2025 | 3D Semantic SegmentationLIDAR Semantic Segmentation | CodeCode Available | 2 |
| MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Mar 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching | Mar 24, 2025 | Weakly-supervised Learning | CodeCode Available | 2 |
| PolarFree: Polarization-based Reflection-free Imaging | Mar 23, 2025 | Reflection RemovalScene Understanding | CodeCode Available | 2 |
| Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study | Mar 23, 2025 | Kolmogorov-Arnold NetworksReinforcement Learning (RL) | CodeCode Available | 2 |
| MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking | Mar 22, 2025 | Object Tracking | CodeCode Available | 2 |
| DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion | Mar 22, 2025 | Infrared And Visible Image Fusion | CodeCode Available | 2 |
| LightLoc: Learning Outdoor LiDAR Localization at Light Speed | Mar 22, 2025 | Autonomous Drivingregression | CodeCode Available | 2 |
| CODA: Repurposing Continuous VAEs for Discrete Tokenization | Mar 22, 2025 | | CodeCode Available | 2 |
| RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images and A Benchmark | Mar 21, 2025 | Data Augmentation | CodeCode Available | 2 |
| OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Mar 21, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| Modifying Large Language Model Post-Training for Diverse Creative Writing | Mar 21, 2025 | DiversityLanguage Modeling | CodeCode Available | 2 |