| Text2Cypher Across Languages: Evaluating Foundational Models Beyond English | Jun 26, 2025 | AttributeText2Sparql | —Unverified | 0 |
| Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval | Jun 26, 2025 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs | Jun 26, 2025 | DiversityMultiple-choice | —Unverified | 0 |
| DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing | Jun 26, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging | Jun 26, 2025 | Denoising | —Unverified | 0 |
| EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning | Jun 26, 2025 | Compositional Zero-Shot LearningMixture-of-Experts | —Unverified | 0 |
| TSDASeg: A Two-Stage Model with Direct Alignment for Interactive Point Cloud Segmentation | Jun 26, 2025 | cross-modal alignmentInteractive Segmentation | —Unverified | 0 |
| VisionGuard: Synergistic Framework for Helmet Violation Detection | Jun 26, 2025 | Classification Consistency | —Unverified | 0 |
| Bridging Video Quality Scoring and Justification via Large Multimodal Models | Jun 26, 2025 | Video Quality AssessmentVisual Question Answering (VQA) | —Unverified | 0 |
| Multimodal Prompt Alignment for Facial Expression Recognition | Jun 26, 2025 | Facial Expression RecognitionFacial Expression Recognition (FER) | —Unverified | 0 |
| Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation | Jun 26, 2025 | GPUImage Generation | —Unverified | 0 |
| CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization | Jun 26, 2025 | 3D Scene ReconstructionChange Detection | —Unverified | 0 |
| Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image | Jun 26, 2025 | 3D Generation3D Reconstruction | —Unverified | 0 |
| GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding | Jun 26, 2025 | 3D visual groundingLarge Language Model | —Unverified | 0 |
| BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models | Jun 26, 2025 | Image Generation | —Unverified | 0 |
| DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic | Jun 26, 2025 | Autonomous DrivingAvg | —Unverified | 0 |
| HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation | Jun 26, 2025 | Panoptic SegmentationSegmentation | —Unverified | 0 |
| PanSt3R: Multi-view Consistent Panoptic Segmentation | Jun 26, 2025 | 2D Panoptic Segmentation3D geometry | —Unverified | 0 |
| CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations | Jun 26, 2025 | Graph GenerationRelation | —Unverified | 0 |
| CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection | Jun 26, 2025 | global-optimizationImage to Point Cloud Registration | —Unverified | 0 |
| FastRef:Fast Prototype Refinement for Few-Shot Industrial Anomaly Detection | Jun 26, 2025 | Anomaly DetectionComputational Efficiency | —Unverified | 0 |
| Controllable 3D Placement of Objects with Scene-Aware Diffusion Models | Jun 26, 2025 | Object | —Unverified | 0 |
| HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation | Jun 26, 2025 | counterfactualCounterfactual Reasoning | —Unverified | 0 |
| MADrive: Memory-Augmented Driving Scene Modeling | Jun 26, 2025 | Autonomous Driving | —Unverified | 0 |
| SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark | Jun 26, 2025 | 3D Anomaly Detection3D Anomaly Detection and Segmentation | —Unverified | 0 |
| Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion | Jun 26, 2025 | Federated LearningPersonalized Federated Learning | —Unverified | 0 |
| Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models | Jun 26, 2025 | State Space ModelsSurgical phase recognition | —Unverified | 0 |
| Inverse Scene Text Removal | Jun 26, 2025 | Binary Classification | CodeCode Available | 0 |
| Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games | Jun 26, 2025 | Reinforcement Learning (RL) | CodeCode Available | 0 |
| Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels | Jun 26, 2025 | Data Augmentation | CodeCode Available | 0 |
| Task-Aware KV Compression For Cost-Effective Long Video Understanding | Jun 26, 2025 | Video Understanding | CodeCode Available | 0 |
| G^2D: Boosting Multimodal Learning with Gradient-Guided Distillation | Jun 26, 2025 | Knowledge DistillationModel Optimization | CodeCode Available | 0 |
| Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? | Jun 26, 2025 | counterfactualGeneral Knowledge | CodeCode Available | 0 |
| Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration | Jun 26, 2025 | HallucinationText Generation | CodeCode Available | 0 |
| Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation | Jun 26, 2025 | BenchmarkingTransfer Learning | CodeCode Available | 0 |
| Adversarial Training: Enhancing Out-of-Distribution Generalization for Learning Wireless Resource Allocation | Jun 26, 2025 | Out-of-Distribution Generalization | —Unverified | 0 |
| Segment Anything in Pathology Images with Natural Language | Jun 26, 2025 | DiagnosticFeature Importance | —Unverified | 0 |
| WAFT: Warping-Alone Field Transforms for Optical Flow | Jun 26, 2025 | Optical Flow EstimationZero-shot Generalization | CodeCode Available | 2 |
| Rethink Sparse Signals for Pose-guided Text-to-image Generation | Jun 26, 2025 | Image GenerationPose-Guided Image Generation | CodeCode Available | 0 |
| LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection | Jun 26, 2025 | object-detectionObject Detection | CodeCode Available | 0 |
| SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes | Jun 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Learning to See in the Extremely Dark | Jun 26, 2025 | DenoisingExposure Correction | CodeCode Available | 2 |
| Evidence-based diagnostic reasoning with multi-agent copilot for human pathology | Jun 26, 2025 | Diagnosticwhole slide images | —Unverified | 0 |
| Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection | Jun 26, 2025 | Change DetectionDecoder | CodeCode Available | 0 |
| Continual Self-Supervised Learning with Masked Autoencoders in Remote Sensing | Jun 26, 2025 | Continual LearningContinual Self-Supervised Learning | —Unverified | 0 |
| SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification | Jun 26, 2025 | 3D Object RetrievalObject | —Unverified | 0 |
| ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models | Jun 26, 2025 | Spatial ReasoningVideo Generation | —Unverified | 0 |
| Temporal Rate Reduction Clustering for Human Motion Segmentation | Jun 26, 2025 | ClusteringMotion Segmentation | —Unverified | 0 |
| Co-Design of Sensing, Communications, and Control for Low-Altitude Wireless Networks | Jun 26, 2025 | Scheduling | —Unverified | 0 |
| Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation | Jun 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |