| NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments | Jun 30, 2025 | Decision MakingVision and Language Navigation | CodeCode Available | 2 |
| L0: Reinforcement Learning to Become General Agents | Jun 30, 2025 | Question Answeringreinforcement-learning | CodeCode Available | 3 |
| Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation | Jun 30, 2025 | Autonomous NavigationComputational Efficiency | CodeCode Available | 1 |
| RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism | Jun 30, 2025 | Question AnsweringRAG | CodeCode Available | 5 |
| Flexibility-Conditioned Protein Structure Design with Flow Matching | Jun 29, 2025 | | CodeCode Available | 0 |
| Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting | Jun 29, 2025 | | CodeCode Available | 0 |
| Endo-4DGX: Robust Endoscopic Scene Reconstruction and Illumination Correction with Gaussian Splatting | Jun 29, 2025 | | CodeCode Available | 0 |
| Learning Counterfactually Decoupled Attention for Open-World Model Attribution | Jun 29, 2025 | | CodeCode Available | 0 |
| MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings | Jun 29, 2025 | | —Unverified | 0 |
| Token Activation Map to Visually Explain Multimodal LLMs | Jun 29, 2025 | | —Unverified | 0 |
| IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering | Jun 29, 2025 | | —Unverified | 0 |
| Teaching a Language Model to Speak the Language of Tools | Jun 29, 2025 | | —Unverified | 0 |
| Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation | Jun 29, 2025 | | CodeCode Available | 0 |
| Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings | Jun 29, 2025 | | CodeCode Available | 0 |
| External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting | Jun 29, 2025 | | CodeCode Available | 0 |
| UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding | Jun 29, 2025 | | CodeCode Available | 0 |
| High-quality Pseudo-labeling for Point Cloud Segmentation with Scene-level Annotation | Jun 29, 2025 | | CodeCode Available | 0 |
| Boosting LLM's Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning | Jun 29, 2025 | | CodeCode Available | 0 |
| Dynamic Contrastive Learning for Hierarchical Retrieval: A Case Study of Distance-Aware Cross-View Geo-Localization | Jun 29, 2025 | | CodeCode Available | 0 |
| Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons | Jun 29, 2025 | | CodeCode Available | 0 |
| RiverText: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams | Jun 29, 2025 | | CodeCode Available | 0 |
| SIEDD: Shared-Implicit Encoder with Discrete Decoders | Jun 29, 2025 | | CodeCode Available | 0 |
| Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging | Jun 29, 2025 | Inference OptimizationMixture-of-Experts | CodeCode Available | 0 |
| TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints | Jun 29, 2025 | 3DGSPose Estimation | —Unverified | 0 |
| DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios | Jun 29, 2025 | Binary ClassificationDeepFake Detection | —Unverified | 0 |
| DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation | Jun 29, 2025 | Camouflaged Object SegmentationInteractive Segmentation | —Unverified | 0 |
| TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure | Jun 29, 2025 | Music Generation | CodeCode Available | 1 |
| CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation | Jun 29, 2025 | Image GenerationImage-to-Image Translation | CodeCode Available | 1 |
| Datasets for Fairness in Language Models: An In-Depth Survey | Jun 29, 2025 | Fairness | CodeCode Available | 1 |
| Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction | Jun 29, 2025 | Image Restoration | —Unverified | 0 |
| Where, What, Why: Towards Explainable Driver Attention Prediction | Jun 29, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting | Jun 29, 2025 | 3D ReconstructionScene Understanding | CodeCode Available | 1 |
| Ovis-U1 Technical Report | Jun 29, 2025 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| Revisiting Z Transform Laplace Inversion: To Correct flaws in Signal and System Theory | Jun 29, 2025 | ARC | —Unverified | 0 |
| Dare to Plagiarize? Plagiarized Painting Recognition and Retrieval | Jun 29, 2025 | Metric LearningRetrieval | —Unverified | 0 |
| Context-Driven Knowledge Graph Completion with Semantic-Aware Relational Message Passing | Jun 29, 2025 | Knowledge Graph CompletionKnowledge Graphs | —Unverified | 0 |
| MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation | Jun 29, 2025 | GPUOptical Flow Estimation | CodeCode Available | 2 |
| ANN-Based Grid Impedance Estimation for Adaptive Gain Scheduling in VSG Under Dynamic Grid Conditions | Jun 29, 2025 | Scheduling | CodeCode Available | 0 |
| Computer-Aided Multi-Stroke Character Simplification by Stroke Removal | Jun 29, 2025 | | CodeCode Available | 0 |
| Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III | Jun 29, 2025 | Model SelectionMultiple-choice | —Unverified | 0 |
| FinAI-BERT: A Transformer-Based Model for Sentence-Level Detection of AI Disclosures in Financial Reports | Jun 29, 2025 | Sentence | CodeCode Available | 0 |
| FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model | Jun 29, 2025 | Brain Tumor SegmentationFederated Learning | CodeCode Available | 0 |
| VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions | Jun 29, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation | Jun 29, 2025 | Organ Segmentation | CodeCode Available | 1 |
| RoboScape: Physics-informed Embodied World Model | Jun 29, 2025 | 3D geometryDepth Estimation | CodeCode Available | 0 |
| MOTOR: Multimodal Optimal Transport via Grounded Retrieval in Medical Visual Question Answering | Jun 28, 2025 | | CodeCode Available | 0 |
| Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography | Jun 28, 2025 | | —Unverified | 0 |
| STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing | Jun 28, 2025 | | —Unverified | 0 |
| MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning | Jun 28, 2025 | | —Unverified | 0 |
| Confident Splatting: Confidence-Based Compression of 3D Gaussian Splatting via Learnable Beta Distributions | Jun 28, 2025 | | CodeCode Available | 0 |