| MadCLIP: Few-shot Medical Anomaly Detection with CLIP | Jun 30, 2025 | | CodeCode Available | 0 |
| Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention | Jun 30, 2025 | | CodeCode Available | 0 |
| GazeTarget360: Towards Gaze Target Estimation in 360-Degree for Robot Perception | Jun 30, 2025 | | CodeCode Available | 0 |
| OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving | Jun 30, 2025 | | CodeCode Available | 0 |
| Event-based Tiny Object Detection: A Benchmark Dataset and Baseline | Jun 30, 2025 | | CodeCode Available | 0 |
| MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis | Jun 30, 2025 | | CodeCode Available | 0 |
| AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data | Jun 30, 2025 | | CodeCode Available | 0 |
| Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model | Jun 30, 2025 | | CodeCode Available | 0 |
| How to Design and Train Your Implicit Neural Representation for Video Compression | Jun 30, 2025 | | CodeCode Available | 0 |
| State and Memory is All You Need for Robust and Reliable AI Agents | Jun 30, 2025 | AllBenchmarking | —Unverified | 0 |
| LineRetriever: Planning-Aware Observation Reduction for Web Agents | Jun 30, 2025 | RetrievalSemantic Similarity | —Unverified | 0 |
| Supercm: Revisiting Clustering for Semi-Supervised Learning | Jun 30, 2025 | Clustering | —Unverified | 0 |
| Discovering the underlying analytic structure within Standard Model constants using artificial intelligence | Jun 30, 2025 | Symbolic Regression | CodeCode Available | 0 |
| A Data-Ensemble-Based Approach for Sample-Efficient LQ Control of Linear Time-Varying Systems | Jun 30, 2025 | Q-Learning | —Unverified | 0 |
| MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI | Jun 30, 2025 | Memorization | CodeCode Available | 0 |
| Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack | Jun 30, 2025 | Adversarial AttackMisinformation | CodeCode Available | 0 |
| A Survey on Vision-Language-Action Models for Autonomous Driving | Jun 30, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 4 |
| Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning | Jun 30, 2025 | Imitation LearningTrajectory Planning | CodeCode Available | 2 |
| GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models | Jun 30, 2025 | Organ SegmentationSegmentation | —Unverified | 0 |
| Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data | Jun 30, 2025 | Visual ReasoningZero Shot Segmentation | —Unverified | 0 |
| MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation | Jun 30, 2025 | DecoderImage Segmentation | —Unverified | 0 |
| STACK: Adversarial Attacks on LLM Safeguard Pipelines | Jun 30, 2025 | Red Teaming | —Unverified | 0 |
| Flash-VStream: Efficient Real-Time Understanding for Long Video Streams | Jun 30, 2025 | cross-modal alignmentEgoSchema | CodeCode Available | 3 |
| Consensus-based optimization for closed-box adversarial attacks and a connection to evolution strategies | Jun 30, 2025 | | CodeCode Available | 0 |
| Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking | Jun 30, 2025 | MambaObject Tracking | CodeCode Available | 1 |
| Dataset Distillation via Vision-Language Category Prototype | Jun 30, 2025 | Dataset DistillationDescriptive | CodeCode Available | 1 |
| Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers | Jun 30, 2025 | Multimodal Reasoning | CodeCode Available | 5 |
| Diffusion Model-based Data Augmentation Method for Fetal Head Ultrasound Segmentation | Jun 30, 2025 | Data AugmentationSegmentation | —Unverified | 0 |
| Visual and Memory Dual Adapter for Multi-Modal Object Tracking | Jun 30, 2025 | Object TrackingPrompt Learning | CodeCode Available | 0 |
| HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity | Jun 30, 2025 | Inverse RenderingNeural Rendering | —Unverified | 0 |
| DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World | Jun 30, 2025 | Caption GenerationObject | CodeCode Available | 2 |
| Refine Any Object in Any Scene | Jun 30, 2025 | Novel View SynthesisObject | CodeCode Available | 1 |
| Epona: Autoregressive Diffusion World Model for Autonomous Driving | Jun 30, 2025 | Autonomous Drivingmodel | CodeCode Available | 3 |
| MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting | Jun 30, 2025 | Image Inpainting | —Unverified | 0 |
| μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation | Jun 30, 2025 | Computed Tomography (CT) | —Unverified | 0 |
| Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Jun 30, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Flow-Through Tensors: A Unified Computational Graph Architecture for Multi-Layer Transportation Network Optimization | Jun 30, 2025 | Tensor Decomposition | —Unverified | 0 |
| Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model | Jun 30, 2025 | Math | —Unverified | 0 |
| FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation | Jun 30, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 1 |
| The Trilemma of Truth in Large Language Models | Jun 30, 2025 | AttributeConformal Prediction | CodeCode Available | 0 |
| Constructing Non-Markovian Decision Process via History Aggregator | Jun 30, 2025 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 0 |
| Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent | Jun 30, 2025 | Interactive RecommendationLarge Language Model | CodeCode Available | 0 |
| Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | Jun 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | Jun 30, 2025 | MathMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| MDPG: Multi-domain Diffusion Prior Guidance for MRI Reconstruction | Jun 30, 2025 | MambaMRI Reconstruction | CodeCode Available | 0 |
| Self-Supervised Multiview Xray Matching | Jun 30, 2025 | Fracture detection | CodeCode Available | 0 |
| Seeding neural network quantum states with tensor network states | Jun 30, 2025 | | CodeCode Available | 0 |
| Real-World En Call Center Transcripts Dataset with PII Redaction | Jun 30, 2025 | PII Redaction | CodeCode Available | 0 |
| Computational Detection of Intertextual Parallels in Biblical Hebrew: A Benchmark Study Using Transformer-Based Language Models | Jun 30, 2025 | Word Embeddings | —Unverified | 0 |
| Ella: Embodied Social Agents with Lifelong Memory | Jun 30, 2025 | Lifelong learning | —Unverified | 0 |