| Diffusion Models-Aided Uplink Channel Estimation for RIS-Assisted Systems | Jun 9, 2025 | Denoising | —Unverified | 0 |
| Rao-Blackwellised Reparameterisation Gradients | Jun 9, 2025 | Variational Inference | —Unverified | 0 |
| Synthetic Visual Genome | Jun 9, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Event-Priori-Based Vision-Language Model for Efficient Visual Understanding | Jun 9, 2025 | Event-based visionLanguage Modeling | —Unverified | 0 |
| SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding | Jun 9, 2025 | RAGRetrieval | —Unverified | 0 |
| Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing | Jun 9, 2025 | Speaker VerificationVariational Inference | —Unverified | 0 |
| Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Jun 9, 2025 | GPUVideo Generation | —Unverified | 0 |
| 4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos | Jun 9, 2025 | Inductive Bias | —Unverified | 0 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 |
| UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References | Jun 9, 2025 | 6D Pose Estimation using RGBImage to 3D | —Unverified | 0 |
| Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor | Jun 9, 2025 | 3D Generation | —Unverified | 0 |
| LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement | Jun 9, 2025 | Decision MakingReinforcement Learning (RL) | —Unverified | 0 |
| EgoM2P: Egocentric Multimodal Multitask Pretraining | Jun 9, 2025 | Depth EstimationGaze Prediction | —Unverified | 0 |
| A distributed motion planning approach to cooperative underwater acoustic source tracking and pursuit | Jun 9, 2025 | Motion Planning | —Unverified | 0 |
| Hidden in plain sight: VLMs overlook their visual representations | Jun 9, 2025 | Depth Estimation | —Unverified | 0 |
| BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Jun 9, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Creating a Historical Migration Dataset from Finnish Church Records, 1800-1920 | Jun 9, 2025 | Handwriting RecognitionTable Detection | —Unverified | 0 |
| TokenBreak: Bypassing Text Classification Models Through Token Manipulation | Jun 9, 2025 | Classificationtext-classification | —Unverified | 0 |
| Quantum Graph Transformer for NLP Sentiment Classification | Jun 9, 2025 | ClassificationLanguage Modeling | —Unverified | 0 |
| A Comparative Study of U-Net Architectures for Change Detection in Satellite Images | Jun 9, 2025 | Change Detection | —Unverified | 0 |
| Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum | Jun 9, 2025 | Hyperparameter OptimizationNetwork Pruning | —Unverified | 0 |
| Language-Vision Planner and Executor for Text-to-Visual Reasoning | Jun 9, 2025 | In-Context LearningMME | —Unverified | 0 |
| Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning | Jun 9, 2025 | AttributeGraph Representation Learning | —Unverified | 0 |
| Language Models over Canonical Byte-Pair Encodings | Jun 9, 2025 | valid | —Unverified | 0 |
| FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning | Jun 9, 2025 | Federated LearningScheduling | —Unverified | 0 |
| Image Reconstruction as a Tool for Feature Analysis | Jun 9, 2025 | Contrastive LearningImage Reconstruction | —Unverified | 0 |
| Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes | Jun 9, 2025 | Computational EfficiencyManagement | —Unverified | 0 |
| Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information | Jun 9, 2025 | Multi-agent Reinforcement Learningreinforcement-learning | —Unverified | 0 |
| Through the Valley: Path to Effective Long CoT Training for Small Language Models | Jun 9, 2025 | 8kReinforcement Learning (RL) | —Unverified | 0 |
| LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization | Jun 9, 2025 | Layout GenerationRobot Navigation | —Unverified | 0 |
| DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO | Jun 9, 2025 | Data AugmentationLarge Language Model | —Unverified | 0 |
| SpatialLM: Training Large Language Models for Structured Indoor Modeling | Jun 9, 2025 | 3D Object DetectionLanguage Modeling | —Unverified | 0 |
| Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs | Jun 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MapBERT: Bitwise Masked Modeling for Real-Time Semantic Mapping Generation | Jun 9, 2025 | Computational EfficiencyObject | —Unverified | 0 |
| SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis | Jun 9, 2025 | Action ClassificationBenchmarking | —Unverified | 0 |
| VIVAT: Virtuous Improving VAE Training through Artifact Mitigation | Jun 9, 2025 | Image GenerationImage Reconstruction | —Unverified | 0 |
| Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models? | Jun 9, 2025 | Fault Diagnosis | —Unverified | 0 |
| Low-Complexity Super-Resolution Signature Estimation of XL-MIMO FMCW Radar | Jun 9, 2025 | Compressive SensingSuper-Resolution | —Unverified | 0 |
| Phase-Only Positioning: Overcoming Integer Ambiguity Challenge through Deep Learning | Jun 9, 2025 | Deep Learning | —Unverified | 0 |
| Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques | Jun 9, 2025 | Activity RecognitionData Augmentation | —Unverified | 0 |
| MIRA: Medical Time Series Foundation Model for Real-World Health Data | Jun 9, 2025 | EthicsMissing Values | —Unverified | 0 |
| Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding | Jun 9, 2025 | Contrastive LearningVideo Editing | —Unverified | 0 |
| Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models | Jun 9, 2025 | Hallucination | —Unverified | 0 |
| A Unified Anti-Jamming Design in Complex Environments Based on Cross-Modal Fusion and Intelligent Decision-Making | Jun 9, 2025 | Decision Making | —Unverified | 0 |
| Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency | Jun 9, 2025 | NeRFScene Generation | —Unverified | 0 |
| Text-guided multi-stage cross-perception network for medical image segmentation | Jun 9, 2025 | Image SegmentationMedical Image Segmentation | —Unverified | 0 |
| Towards Large Language Models with Self-Consistent Natural Language Explanations | Jun 9, 2025 | Feature Importance | —Unverified | 0 |
| Correlated Errors in Large Language Models | Jun 9, 2025 | Diversity | —Unverified | 0 |
| Clustered Federated Learning via Embedding Distributions | Jun 9, 2025 | ClusteringDomain Adaptation | CodeCode Available | 0 |
| Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions | Jun 9, 2025 | Large Language ModelReinforcement Learning (RL) | CodeCode Available | 2 |