| What Has Been Lost with Synthetic Evaluation? | May 28, 2025 | NegationReading Comprehension | —Unverified | 0 |
| Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models | May 27, 2025 | valid | CodeCode Available | 0 |
| STACI: Spatio-Temporal Aleatoric Conformal Inference | May 27, 2025 | Gaussian ProcessesGPU | —Unverified | 0 |
| PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects | May 27, 2025 | Privacy PreservingUncertainty Quantification | —Unverified | 0 |
| Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners | May 26, 2025 | MuJoCovalid | —Unverified | 0 |
| On the Robustness of RSMA to Adversarial BD-RIS-Induced Interference | May 26, 2025 | valid | —Unverified | 0 |
| Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach | May 26, 2025 | TARvalid | —Unverified | 0 |
| HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices | May 26, 2025 | In-Context LearningRetrieval-augmented Generation | CodeCode Available | 0 |
| We Need to Measure Data Diversity in NLP -- Better and Broader | May 26, 2025 | Diversityvalid | —Unverified | 0 |
| PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation | May 26, 2025 | valid | —Unverified | 0 |
| Optimal Conformal Prediction under Epistemic Uncertainty | May 25, 2025 | Conformal PredictionPrediction | CodeCode Available | 0 |
| NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results | May 25, 2025 | validVideo Quality Assessment | CodeCode Available | 0 |
| Efficient Long CoT Reasoning in Small Language Models | May 24, 2025 | Mathematical Reasoningvalid | —Unverified | 0 |
| MedScore: Factuality Evaluation of Free-Form Medical Answers | May 24, 2025 | FormHallucination | CodeCode Available | 0 |
| Graph Style Transfer for Counterfactual Explainability | May 23, 2025 | counterfactualCounterfactual Explanation | CodeCode Available | 0 |
| Flexible MOF Generation with Torsion-Aware Flow Matching | May 23, 2025 | valid | —Unverified | 0 |
| Anytime-valid, Bayes-assisted,Prediction-Powered Inference | May 23, 2025 | Predictionvalid | —Unverified | 0 |
| Efficient Adaptive Experimentation with Non-Compliance | May 23, 2025 | valid | CodeCode Available | 0 |
| Applications of Modular Co-Design for De Novo 3D Molecule Generation | May 23, 2025 | 3D Molecule GenerationDenoising | —Unverified | 0 |
| Effects of auditory distance cues and reverberation on spatial perception and listening strategies | May 23, 2025 | valid | CodeCode Available | 0 |
| Statistical Inference for Online Algorithms | May 22, 2025 | valid | CodeCode Available | 0 |
| MuseRAG: Idea Originality Scoring At Scale | May 22, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 0 |
| A collaborative constrained graph diffusion model for the generation of realistic synthetic molecules | May 22, 2025 | valid | CodeCode Available | 0 |
| Statistical Test for Saliency Maps of Graph Neural Networks via Selective Inference | May 22, 2025 | valid | —Unverified | 0 |
| Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack | May 21, 2025 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |