| Follow-Your-Creation: Empowering 4D Creation through Video Inpainting | Jun 5, 2025 | Video GenerationVideo Inpainting | —Unverified | 0 |
| Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors | Jun 5, 2025 | Autonomous Vehicles | CodeCode Available | 0 |
| Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts | Jun 5, 2025 | Explainable ModelsFew-Shot Image Classification | —Unverified | 0 |
| Geological Field Restoration through the Lens of Image Inpainting | Jun 5, 2025 | Image InpaintingMissing Values | —Unverified | 0 |
| AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models | Jun 5, 2025 | Attribute | —Unverified | 0 |
| ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation | Jun 5, 2025 | Active Learning | —Unverified | 0 |
| Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models | Jun 5, 2025 | Additive models | —Unverified | 0 |
| Improving AI-generated music with user-guided training | Jun 5, 2025 | Image GenerationMusic Generation | —Unverified | 0 |
| Gen-n-Val: Agentic Image Data Generation and Validation | Jun 5, 2025 | Image HarmonizationInstance Segmentation | —Unverified | 0 |
| From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems | Jun 5, 2025 | BenchmarkingRAG | —Unverified | 0 |
| Enhancing Frequency for Single Image Super-Resolution with Learnable Separable Kernels | Jun 5, 2025 | Image Super-ResolutionSuper-Resolution | —Unverified | 0 |
| Rectified Point Flow: Generic Point Cloud Pose Estimation | Jun 5, 2025 | Point Cloud RegistrationPose Estimation | —Unverified | 0 |
| Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training | Jun 5, 2025 | Dataset Generationobject-detection | —Unverified | 0 |
| Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System | Jun 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLM-based phoneme-to-grapheme for phoneme-based speech recognition | Jun 5, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The NTNU System at the S&I Challenge 2025 SLA Open Track | Jun 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering | Jun 5, 2025 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning | Jun 5, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Energentic Intelligence: From Self-Sustaining Systems to Enduring Artificial Life | Jun 5, 2025 | Artificial Life | —Unverified | 0 |
| Nonlinear Causal Discovery for Grouped Data | Jun 5, 2025 | Causal DiscoveryGraph Learning | —Unverified | 0 |
| Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach | Jun 5, 2025 | Irregular Time Series | —Unverified | 0 |
| Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning | Jun 5, 2025 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |
| Subjective Perspectives within Learned Representations Predict High-Impact Innovation | Jun 5, 2025 | DiversityLarge Language Model | —Unverified | 0 |
| On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models | Jun 5, 2025 | Instruction FollowingReinforcement Learning (RL) | —Unverified | 0 |
| UnHiPPO: Uncertainty-aware Initialization for State Space Models | Jun 5, 2025 | State Space Models | —Unverified | 0 |
| Transformers Meet In-Context Learning: A Universal Approximation Theory | Jun 5, 2025 | In-Context Learning | —Unverified | 0 |
| EECD-Net: Energy-Efficient Crack Detection with Spiking Neural Networks and Gated Attention | Jun 5, 2025 | Super-Resolution | —Unverified | 0 |
| Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning | Jun 5, 2025 | MathVisual Grounding | —Unverified | 0 |
| LGM-Pose: A Lightweight Global Modeling Network for Real-time Human Pose Estimation | Jun 5, 2025 | Multi-Person Pose EstimationPose Estimation | —Unverified | 0 |
| Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning | Jun 5, 2025 | Continual Learning | —Unverified | 0 |
| SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents | Jun 5, 2025 | Attribute | —Unverified | 0 |
| Perfecting Depth: Uncertainty-Aware Enhancement of Metric Depth | Jun 5, 2025 | Autonomous Driving | —Unverified | 0 |
| Line of Sight: On Linear Representations in VLLMs | Jun 5, 2025 | Diversity | —Unverified | 0 |
| Deep Learning Reforms Image Matching: A Survey and Outlook | Jun 5, 2025 | 3D ReconstructionDeep Learning | —Unverified | 0 |
| Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion | Jun 5, 2025 | Imitation LearningRepresentation Learning | —Unverified | 0 |
| Bridging Annotation Gaps: Transferring Labels to Align Object Detection Datasets | Jun 5, 2025 | object-detectionObject Detection | —Unverified | 0 |
| SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs | Jun 5, 2025 | backdoor defenseImage Captioning | —Unverified | 0 |
| Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement | Jun 5, 2025 | ClusteringImage Enhancement | —Unverified | 0 |
| Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations | Jun 5, 2025 | 3D Object ReconstructionNovel View Synthesis | —Unverified | 0 |
| DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation | Jun 5, 2025 | Motion CompensationOptical Flow Estimation | —Unverified | 0 |
| From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes | Jun 5, 2025 | 3D visual groundingObject | —Unverified | 0 |
| Light and 3D: a methodological exploration of digitisation techniques adapted to a selection of objects from the Musée d'Archéologie Nationale | Jun 5, 2025 | DiversityObject | —Unverified | 0 |
| CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx | Jun 5, 2025 | 2D Pose EstimationBenchmarking | —Unverified | 0 |
| Amortized variational transdimensional inference | Jun 5, 2025 | Bayesian InferenceBayesian Optimization | CodeCode Available | 0 |
| Towards Network Data Analytics in 5G Systems and Beyond | Jun 5, 2025 | Articles | CodeCode Available | 0 |
| OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Jun 5, 2025 | Instance SegmentationLanguage Modeling | CodeCode Available | 1 |
| FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion | Jun 5, 2025 | DenoisingQuantization | —Unverified | 0 |
| Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study | Jun 5, 2025 | Logical Reasoning | CodeCode Available | 0 |
| Intelligent Channel Allocation for IEEE 802.11be Multi-Link Operation: When MAB Meets LLM | Jun 5, 2025 | Combinatorial OptimizationLarge Language Model | CodeCode Available | 0 |
| Agentic AI for Intent-Based Industrial Automation | Jun 5, 2025 | | CodeCode Available | 0 |