| What Has Been Lost with Synthetic Evaluation? | May 28, 2025 | NegationReading Comprehension | —Unverified | 0 |
| Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models | May 27, 2025 | valid | CodeCode Available | 0 |
| STACI: Spatio-Temporal Aleatoric Conformal Inference | May 27, 2025 | Gaussian ProcessesGPU | —Unverified | 0 |
| PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects | May 27, 2025 | Privacy PreservingUncertainty Quantification | —Unverified | 0 |
| Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners | May 26, 2025 | MuJoCovalid | —Unverified | 0 |
| On the Robustness of RSMA to Adversarial BD-RIS-Induced Interference | May 26, 2025 | valid | —Unverified | 0 |
| Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach | May 26, 2025 | TARvalid | —Unverified | 0 |
| HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices | May 26, 2025 | In-Context LearningRetrieval-augmented Generation | CodeCode Available | 0 |
| We Need to Measure Data Diversity in NLP -- Better and Broader | May 26, 2025 | Diversityvalid | —Unverified | 0 |
| PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation | May 26, 2025 | valid | —Unverified | 0 |
| Optimal Conformal Prediction under Epistemic Uncertainty | May 25, 2025 | Conformal PredictionPrediction | CodeCode Available | 0 |
| NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results | May 25, 2025 | validVideo Quality Assessment | CodeCode Available | 0 |
| Efficient Long CoT Reasoning in Small Language Models | May 24, 2025 | Mathematical Reasoningvalid | —Unverified | 0 |
| MedScore: Factuality Evaluation of Free-Form Medical Answers | May 24, 2025 | FormHallucination | CodeCode Available | 0 |
| Graph Style Transfer for Counterfactual Explainability | May 23, 2025 | counterfactualCounterfactual Explanation | CodeCode Available | 0 |
| Flexible MOF Generation with Torsion-Aware Flow Matching | May 23, 2025 | valid | —Unverified | 0 |
| Anytime-valid, Bayes-assisted,Prediction-Powered Inference | May 23, 2025 | Predictionvalid | —Unverified | 0 |
| Efficient Adaptive Experimentation with Non-Compliance | May 23, 2025 | valid | CodeCode Available | 0 |
| Applications of Modular Co-Design for De Novo 3D Molecule Generation | May 23, 2025 | 3D Molecule GenerationDenoising | —Unverified | 0 |
| Effects of auditory distance cues and reverberation on spatial perception and listening strategies | May 23, 2025 | valid | CodeCode Available | 0 |
| Statistical Inference for Online Algorithms | May 22, 2025 | valid | CodeCode Available | 0 |
| MuseRAG: Idea Originality Scoring At Scale | May 22, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 0 |
| A collaborative constrained graph diffusion model for the generation of realistic synthetic molecules | May 22, 2025 | valid | CodeCode Available | 0 |
| Statistical Test for Saliency Maps of Graph Neural Networks via Selective Inference | May 22, 2025 | valid | —Unverified | 0 |
| Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack | May 21, 2025 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study | May 21, 2025 | valid | —Unverified | 0 |
| Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets | May 21, 2025 | Diversityvalid | —Unverified | 0 |
| ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges | May 21, 2025 | Mathvalid | CodeCode Available | 1 |
| Projection-Based Correction for Enhancing Deep Inverse Networks | May 21, 2025 | valid | —Unverified | 0 |
| Temporal Alignment of Time Sensitive Facts with Activation Engineering | May 20, 2025 | valid | —Unverified | 0 |
| Valid Post-Contextual Bandit Inference | May 20, 2025 | Translationvalid | —Unverified | 0 |
| Learning to Insert for Constructive Neural Vehicle Routing Solver | May 20, 2025 | Model OptimizationPosition | —Unverified | 0 |
| A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design | May 19, 2025 | BenchmarkingDrug Discovery | —Unverified | 0 |
| NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results | May 17, 2025 | valid | —Unverified | 0 |
| Coherent Language Reconstruction from Brain Recordings with Flexible Multi-Modal Input Stimuli | May 15, 2025 | valid | —Unverified | 0 |
| Better Understanding Triple Differences Estimators | May 15, 2025 | valid | —Unverified | 0 |
| A spherical amplitude-phase formulation for 3-D adaptive line-of-sight (ALOS) guidance with USGES stability guarantees | May 13, 2025 | valid | —Unverified | 0 |
| Feature Fitted Online Conformal Prediction for Deep Time Series Forecasting Model | May 13, 2025 | Conformal PredictionPrediction | CodeCode Available | 0 |
| Sharp Gaussian approximations for Decentralized Federated Learning | May 12, 2025 | Federated Learningvalid | —Unverified | 0 |
| Measuring General Intelligence with Generated Games | May 12, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 |
| Transfer Learning Across Fixed-Income Product Classes | May 12, 2025 | Transfer Learningvalid | —Unverified | 0 |
| Generalization Bounds and Stopping Rules for Learning with Self-Selected Data | May 12, 2025 | Active LearningGeneralization Bounds | —Unverified | 0 |
| Chronocept: Instilling a Sense of Time in Machines | May 12, 2025 | Fact CheckingRAG | CodeCode Available | 1 |
| LLM-Augmented Chemical Synthesis and Design Decision Programs | May 11, 2025 | Decision MakingMulti-step retrosynthesis | —Unverified | 0 |
| Tell Me Who Your Students Are: GPT Can Generate Valid Multiple-Choice Questions When Students' (Mis)Understanding Is Hinted | May 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Evolutionary thoughts: integration of large language models and evolutionary algorithms | May 9, 2025 | Evolutionary AlgorithmsHallucination | CodeCode Available | 0 |
| Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs | May 8, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| Fair Uncertainty Quantification for Depression Prediction | May 8, 2025 | Conformal PredictionFairness | —Unverified | 0 |
| PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes | May 8, 2025 | valid | —Unverified | 0 |
| LLM Code Customization with Visual Results: A Benchmark on TikZ | May 7, 2025 | Code Generationvalid | —Unverified | 0 |