| Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models | Jun 12, 2025 | Dialogue State Tracking | —Unverified | 0 |
| Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts | Jun 12, 2025 | Causal Inferencecounterfactual | CodeCode Available | 1 |
| Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs | Jun 12, 2025 | Speech-to-Speech Translationtext-to-speech | —Unverified | 0 |
| VINCIE: Unlocking In-context Image Editing from Video | Jun 12, 2025 | PredictionSegmentation | —Unverified | 0 |
| Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning | Jun 12, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs | Jun 12, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Can We Infer Confidential Properties of Training Data from LLMs? | Jun 12, 2025 | image-classificationImage Classification | —Unverified | 0 |
| PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation | Jun 12, 2025 | EEG | —Unverified | 0 |
| Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models | Jun 12, 2025 | Decision Making | —Unverified | 0 |
| Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages | Jun 12, 2025 | Domain AdaptationPseudo Label | —Unverified | 0 |
| PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier | Jun 12, 2025 | Reinforcement Learning (RL) | —Unverified | 0 |
| Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty | Jun 12, 2025 | GSM8K | —Unverified | 0 |
| PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models | Jun 12, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers | Jun 12, 2025 | All | —Unverified | 0 |
| Provably Learning from Language Feedback | Jun 12, 2025 | Large Language Model | —Unverified | 0 |
| PAL: Probing Audio Encoders via LLMs -- A Study of Information Transfer from Audio Encoders to LLMs | Jun 12, 2025 | | —Unverified | 0 |
| TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving | Jun 12, 2025 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 |
| Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering | Jun 12, 2025 | Answer GenerationQuestion Answering | —Unverified | 0 |
| Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification? | Jun 12, 2025 | Property Predictionvalid | —Unverified | 0 |
| Build the web for agents, not agents for the web | Jun 12, 2025 | Navigate | —Unverified | 0 |
| MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning | Jun 12, 2025 | Image GenerationMultimodal Reasoning | —Unverified | 0 |
| Demystifying Spectral Feature Learning for Instrumental Variable Regression | Jun 12, 2025 | regression | —Unverified | 0 |
| Meta-learning Representations for Learning from Multiple Annotators | Jun 12, 2025 | Meta-Learning | —Unverified | 0 |
| The Gittins Index: A Design Principle for Decision-Making Under Uncertainty | Jun 12, 2025 | Bayesian OptimizationDecision Making | —Unverified | 0 |
| Rethinking Losses for Diffusion Bridge Samplers | Jun 12, 2025 | Hyperparameter Optimization | —Unverified | 0 |
| Robustly Improving LLM Fairness in Realistic Settings via Interpretability | Jun 12, 2025 | AttributeFairness | CodeCode Available | 0 |
| Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes | Jun 12, 2025 | Pseudo Label | —Unverified | 0 |
| Collaborative Min-Max Regret in Grouped Multi-Armed Bandits | Jun 12, 2025 | Multi-Armed Bandits | —Unverified | 0 |
| Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization | Jun 12, 2025 | Dictionary Learning | CodeCode Available | 1 |
| CIIR@LiveRAG 2025: Optimizing Multi-Agent Retrieval Augmented Generation through Self-Training | Jun 12, 2025 | RAGResponse Generation | CodeCode Available | 0 |
| Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning | Jun 12, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 0 |
| SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis | Jun 12, 2025 | BenchmarkingDialogue Generation | CodeCode Available | 2 |
| "Check My Work?": Measuring Sycophancy in a Simulated Educational Context | Jun 12, 2025 | | CodeCode Available | 0 |
| Code Execution as Grounded Supervision for LLM Reasoning | Jun 12, 2025 | Dataset Generation | CodeCode Available | 0 |
| NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Jun 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims | Jun 12, 2025 | RetrievalRetrieval-augmented Generation | CodeCode Available | 0 |
| AutoMind: Adaptive Knowledgeable Agent for Automated Data Science | Jun 12, 2025 | Code GenerationLarge Language Model | CodeCode Available | 2 |
| Size-adaptive Hypothesis Testing for Fairness | Jun 12, 2025 | Fairness | CodeCode Available | 0 |
| Detecting Sockpuppetry on Wikipedia Using Meta-Learning | Jun 12, 2025 | Meta-Learning | CodeCode Available | 0 |
| Discrete Audio Tokens: More Than a Survey! | Jun 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers | Jun 12, 2025 | HallucinationOptical Character Recognition (OCR) | —Unverified | 0 |
| Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment | Jun 12, 2025 | Dialogue GenerationTriplet | CodeCode Available | 0 |
| An Analysis of Datasets, Metrics and Models in Keyphrase Generation | Jun 12, 2025 | Keyphrase Generation | CodeCode Available | 0 |
| Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration | Jun 12, 2025 | Decision Making | CodeCode Available | 0 |
| VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning | Jun 12, 2025 | Quantum Machine Learning | —Unverified | 0 |
| Dynamic Epistemic Friction in Dialogue | Jun 12, 2025 | Friction | —Unverified | 0 |
| ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization | Jun 12, 2025 | Math | CodeCode Available | 0 |
| Table-Text Alignment: Explaining Claim Verification Against Tables in Scientific Papers | Jun 12, 2025 | Claim Verification | CodeCode Available | 0 |
| AC/DC: LLM-based Audio Comprehension via Dialogue Continuation | Jun 12, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs | Jun 12, 2025 | ClusteringInfluence Approximation | —Unverified | 0 |