| Uni-LoRA: One Vector is All You Need | Jun 1, 2025 | AllMathematical Reasoning | —Unverified | 0 |
| HouseTS: A Large-Scale, Multimodal Spatiotemporal U.S. Housing Dataset | Jun 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| 3D Skeleton-Based Action Recognition: A Review | Jun 1, 2025 | Action RecognitionData Augmentation | —Unverified | 0 |
| Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models | Jun 1, 2025 | ChunkingMulti-hop Question Answering | CodeCode Available | 0 |
| AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation | Jun 1, 2025 | MambaMotion Compensation | CodeCode Available | 2 |
| SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | Jun 1, 2025 | Denoising | CodeCode Available | 9 |
| A Graph-Retrieval-Augmented Generation Framework Enhances Decision-Making in the Circular Economy | Jun 1, 2025 | Decision MakingMulti-hop Question Answering | —Unverified | 0 |
| Localized Forest Fire Risk Prediction: A Department-Aware Approach for Operational Decision Support | Jun 1, 2025 | Binary Classification | —Unverified | 0 |
| GigaAM: Efficient Self-Supervised Learner for Speech Recognition | Jun 1, 2025 | Automatic Speech RecognitionLanguage Modeling | CodeCode Available | 4 |
| HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement | Jun 1, 2025 | DisentanglementSelf-Supervised Learning | —Unverified | 0 |
| Bridging Subjective and Objective QoE: Operator-Level Aggregation Using LLM-Based Comment Analysis and Network MOS Comparison | Jun 1, 2025 | Large Language ModelTime Series Analysis | —Unverified | 0 |
| Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation | Jun 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting | Jun 1, 2025 | 3D ReconstructionNeural Rendering | —Unverified | 0 |
| Camera Trajectory Generation: A Comprehensive Survey of Methods, Metrics, and Future Directions | Jun 1, 2025 | Visual Storytelling | —Unverified | 0 |
| EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG | Jun 1, 2025 | Contrastive LearningDecoder | —Unverified | 0 |
| Test Automation for Interactive Scenarios via Promptable Traffic Simulation | Jun 1, 2025 | Bayesian Optimization | —Unverified | 0 |
| OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation | Jun 1, 2025 | Image GenerationLarge Language Model | —Unverified | 0 |
| DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving | Jun 1, 2025 | Autonomous DrivingDecoder | —Unverified | 0 |
| NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction | Jun 1, 2025 | DecoderLanguage Modeling | —Unverified | 0 |
| Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations | Jun 1, 2025 | Emotion RecognitionRhythm | —Unverified | 0 |
| Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism | Jun 1, 2025 | Rhythm | —Unverified | 0 |
| PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition | Jun 1, 2025 | Emotion RecognitionMamba | —Unverified | 0 |
| From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models | Jun 1, 2025 | World Knowledge | —Unverified | 0 |
| Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition | Jun 1, 2025 | Contrastive LearningEmotion Recognition | —Unverified | 0 |
| PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data | Jun 1, 2025 | Voice Conversion | —Unverified | 0 |
| Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection | Jun 1, 2025 | Sarcasm Detection | —Unverified | 0 |
| General-purpose audio representation learning for real-world sound scenes | Jun 1, 2025 | MambaRepresentation Learning | —Unverified | 0 |
| CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching | Jun 1, 2025 | Dialogue GenerationDisentanglement | —Unverified | 0 |
| CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer | Jun 1, 2025 | Audio captioningLanguage Modeling | —Unverified | 0 |
| Legal Compliance Evaluation of Smart Contracts Generated By Large Language Models | Jun 1, 2025 | Code Generation | —Unverified | 0 |
| Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning | Jun 1, 2025 | Multi-agent Reinforcement Learningreinforcement-learning | —Unverified | 0 |
| FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion | Jun 1, 2025 | Audio captioningCaption Generation | CodeCode Available | 2 |
| How Programming Concepts and Neurons Are Shared in Code Language Models | Jun 1, 2025 | Translation | CodeCode Available | 0 |
| In-the-wild Audio Spatialization with Flexible Text-guided Localization | Jun 1, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| HADA: Human-AI Agent Decision Alignment Architecture | Jun 1, 2025 | AI AgentEthics | —Unverified | 0 |
| MCP-Zero: Active Tool Discovery for Autonomous LLM Agents | Jun 1, 2025 | RetrievalSemantic Similarity | —Unverified | 0 |
| Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment | Jun 1, 2025 | Classificationregression | CodeCode Available | 0 |
| Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Models for Method Generation | Jun 1, 2025 | Model SelectionPrompt Engineering | CodeCode Available | 0 |
| Speech Unlearning | Jun 1, 2025 | Adversarial RobustnessKeyword Spotting | —Unverified | 0 |
| Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models | Jun 1, 2025 | counterfactualSpeech Synthesis | —Unverified | 0 |
| Choices and their Provenance: Explaining Stable Solutions of Abstract Argumentation Frameworks | Jun 1, 2025 | Abstract Argumentation | —Unverified | 0 |
| Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations | Jun 1, 2025 | DeepFake DetectionFace Swapping | CodeCode Available | 0 |
| What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training | Jun 1, 2025 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 0 |
| Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody | Jun 1, 2025 | In-Context Learningspeech-recognition | —Unverified | 0 |
| A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement | Jun 1, 2025 | Speech Enhancement | —Unverified | 0 |
| Towards Predicting Any Human Trajectory In Context | Jun 1, 2025 | In-Context LearningPedestrian Trajectory Prediction | —Unverified | 0 |
| Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching | Jun 1, 2025 | RhythmStyle Transfer | —Unverified | 0 |
| HMPC-assisted Adversarial Inverse Reinforcement Learning for Smart Home Energy Management | Jun 1, 2025 | energy managementManagement | —Unverified | 0 |
| Beyond Attention: Learning Spatio-Temporal Dynamics with Emergent Interpretable Topologies | Jun 1, 2025 | Computational EfficiencyGraph Attention | —Unverified | 0 |
| Crowdsourcing MUSHRA Tests in the Age of Generative Speech Technologies: A Comparative Analysis of Subjective and Objective Testing Methods | Jun 1, 2025 | | CodeCode Available | 1 |