| Adding simple structure at inference improves Vision-Language Compositionality | Jun 11, 2025 | AttributeImage-text Retrieval | CodeCode Available | 0 |
| PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants | Jun 11, 2025 | | CodeCode Available | 0 |
| How much is too much? Measuring divergence from Benford's Law with the Equivalent Contamination Proportion (ECP) | Jun 11, 2025 | | CodeCode Available | 0 |
| CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain | Jun 11, 2025 | Data AugmentationImage Registration | CodeCode Available | 0 |
| Intent Factored Generation: Unleashing the Diversity in Your Language Model | Jun 11, 2025 | ArticlesDiversity | CodeCode Available | 0 |
| On the Similarities of Embeddings in Contrastive Learning | Jun 11, 2025 | Contrastive Learning | CodeCode Available | 1 |
| Guided Graph Compression for Quantum Graph Neural Networks | Jun 11, 2025 | Jet Tagging | CodeCode Available | 0 |
| Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy | Jun 11, 2025 | Deep Reinforcement Learning | —Unverified | 0 |
| AI5GTest: AI-Driven Specification-Aware Automated Testing and Validation of 5G O-RAN Components | Jun 11, 2025 | Overall - Test | —Unverified | 0 |
| Towards Responsible AI: Advances in Safety, Fairness, and Accountability of Autonomous Systems | Jun 11, 2025 | Autonomous VehiclesDecision Making | —Unverified | 0 |
| Regularizing Learnable Feature Extraction for Automatic Speech Recognition | Jun 11, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Retrieval of Surface Solar Radiation through Implicit Albedo Recovery from Temporal Context | Jun 11, 2025 | Retrieval | CodeCode Available | 0 |
| Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes | Jun 11, 2025 | | CodeCode Available | 0 |
| A Navigation Framework Utilizing Vision-Language Models | Jun 11, 2025 | NavigatePrompt Engineering | CodeCode Available | 0 |
| Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages | Jun 11, 2025 | | CodeCode Available | 0 |
| Ming-Omni: A Unified Multimodal Model for Perception and Generation | Jun 11, 2025 | Image Generationtext-to-speech | CodeCode Available | 4 |
| MEDUSA: A Multimodal Deep Fusion Multi-Stage Training Framework for Speech Emotion Recognition in Naturalistic Conditions | Jun 11, 2025 | Emotion RecognitionSpeech Emotion Recognition | CodeCode Available | 0 |
| California Crop Yield Benchmark: Combining Satellite Image, Climate, Evapotranspiration, and Soil Data Layers for County-Level Yield Forecasting of Over 70 Crops | Jun 11, 2025 | | CodeCode Available | 1 |
| Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos | Jun 11, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| A Deep Generative Model for the Simulation of Discrete Karst Networks | Jun 11, 2025 | Denoising | —Unverified | 0 |
| A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity | Jun 11, 2025 | Federated Learning | —Unverified | 0 |
| RoCA: Robust Cross-Domain End-to-End Autonomous Driving | Jun 11, 2025 | Autonomous DrivingDomain Adaptation | —Unverified | 0 |
| HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations | Jun 11, 2025 | Image GenerationQuantization | —Unverified | 0 |
| UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching | Jun 11, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Conditional diffusion models for guided anomaly detection in brain images using fluid-driven anomaly randomization | Jun 11, 2025 | Anomaly DetectionImage Reconstruction | —Unverified | 0 |
| ODG: Occupancy Prediction Using Dual Gaussians | Jun 11, 2025 | 3D geometryAutonomous Driving | —Unverified | 0 |
| Effective Red-Teaming of Policy-Adherent Agents | Jun 11, 2025 | Red Teaming | —Unverified | 0 |
| Adversarial Surrogate Risk Bounds for Binary Classification | Jun 11, 2025 | Binary ClassificationClassification | —Unverified | 0 |
| Accurate and efficient zero-shot 6D pose estimation with frozen foundation models | Jun 11, 2025 | 6D Pose EstimationInstance Segmentation | —Unverified | 0 |
| Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints | Jun 11, 2025 | Image RetrievalVisual Localization | —Unverified | 0 |
| Class Similarity-Based Multimodal Classification under Heterogeneous Category Sets | Jun 11, 2025 | Transfer Learning | —Unverified | 0 |
| PlayerOne: Egocentric World Simulator | Jun 11, 2025 | Video Generation | —Unverified | 0 |
| Enhancing Human-Robot Collaboration: A Sim2Real Domain Adaptation Algorithm for Point Cloud Segmentation in Industrial Environments | Jun 11, 2025 | Domain AdaptationPoint Cloud Segmentation | —Unverified | 0 |
| "What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) | Jun 11, 2025 | DiversityQ-Learning | —Unverified | 0 |
| The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability | Jun 11, 2025 | Decision MakingNavigate | —Unverified | 0 |
| Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy | Jun 11, 2025 | AnatomyReinforcement Learning (RL) | —Unverified | 0 |
| CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings | Jun 11, 2025 | 6D Pose EstimationPose Estimation | —Unverified | 0 |
| When Is Diversity Rewarded in Cooperative Multi-Agent Learning? | Jun 11, 2025 | DiversityMulti-agent Reinforcement Learning | —Unverified | 0 |
| Towards Multi-modal Graph Large Language Model | Jun 11, 2025 | Graph LearningIn-Context Learning | —Unverified | 0 |
| Measuring Communication Quality of Interest Rate Announcements | Jun 11, 2025 | | CodeCode Available | 0 |
| GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras | Jun 11, 2025 | Benchmarking | CodeCode Available | 1 |
| A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild | Jun 11, 2025 | Age EstimationBenchmarking | CodeCode Available | 0 |
| Structural-Spectral Graph Convolution with Evidential Edge Learning for Hyperspectral Image Clustering | Jun 11, 2025 | ClusteringContrastive Learning | CodeCode Available | 0 |
| A Hierarchical Probabilistic Framework for Incremental Knowledge Tracing in Classroom Settings | Jun 11, 2025 | Knowledge Tracing | CodeCode Available | 0 |
| V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | Jun 11, 2025 | Action AnticipationLarge Language Model | CodeCode Available | 7 |
| Inverting Black-Box Face Recognition Systems via Zero-Order Optimization in Eigenface Space | Jun 11, 2025 | Face Recognition | CodeCode Available | 0 |
| EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks | Jun 11, 2025 | Pose Estimation | CodeCode Available | 0 |
| SAFE: Multitask Failure Detection for Vision-Language-Action Models | Jun 11, 2025 | Conformal PredictionVision-Language-Action | —Unverified | 0 |
| MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed | Jun 11, 2025 | RetrievalVideo Retrieval | —Unverified | 0 |
| Causal Climate Emulation with Bayesian Filtering | Jun 11, 2025 | Representation Learning | —Unverified | 0 |