| Statistically Valid Information Bottleneck via Multiple Hypothesis Testing | Sep 11, 2024 | valid | —Unverified | 0 |
| Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement | Sep 10, 2024 | Multiple-choiceSentence | —Unverified | 0 |
| Improving Conditional Level Generation using Automated Validation in Match-3 Games | Sep 10, 2024 | valid | —Unverified | 0 |
| Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences | Sep 10, 2024 | Contrastive Learningvalid | CodeCode Available | 0 |
| NSP: A Neuro-Symbolic Natural Language Navigational Planner | Sep 10, 2024 | valid | —Unverified | 0 |
| The Surprising Robustness of Partial Least Squares | Sep 9, 2024 | Dimensionality Reductionvalid | —Unverified | 0 |
| Inference for Large Scale Regression Models with Dependent Errors | Sep 8, 2024 | Gaussian Processesparameter estimation | —Unverified | 0 |
| Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models | Sep 7, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| Leveraging Machine Learning for Official Statistics: A Statistical Manifesto | Sep 6, 2024 | Surveyvalid | —Unverified | 0 |
| FuzzCoder: Byte-level Fuzzing Test via Large Language Model | Sep 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Federated Prediction-Powered Inference from Decentralized Data | Sep 3, 2024 | Federated LearningPrediction | CodeCode Available | 0 |
| An essay on the history of DSGE models | Sep 1, 2024 | valid | —Unverified | 0 |
| Stochastic Monotonicity and Random Utility Models: The Good and The Ugly | Sep 1, 2024 | valid | —Unverified | 0 |
| "Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration | Aug 30, 2024 | Embodied Question AnsweringQuestion Answering | —Unverified | 0 |
| The creative psychometric item generator: a framework for item generation and validation using large language models | Aug 30, 2024 | valid | —Unverified | 0 |
| Self-supervised learning for crystal property prediction via denoising | Aug 30, 2024 | DenoisingPrediction | —Unverified | 0 |
| Continual learning with the neural tangent ensemble | Aug 30, 2024 | Continual Learningvalid | —Unverified | 0 |
| Can Unconfident LLM Annotations Be Used for Confident Conclusions? | Aug 27, 2024 | valid | CodeCode Available | 1 |
| Double/Debiased CoCoLASSO of Treatment Effects with Mismeasured High-Dimensional Control Variables | Aug 26, 2024 | Econometricsvalid | —Unverified | 0 |
| EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory | Aug 26, 2024 | Decision MakingDiversity | —Unverified | 0 |
| Investigating the effect of Mental Models in User Interaction with an Adaptive Dialog Agent | Aug 26, 2024 | valid | —Unverified | 0 |
| RoCP-GNN: Robust Conformal Prediction for Graph Neural Networks in Node-Classification | Aug 25, 2024 | Conformal PredictionLink Prediction | —Unverified | 0 |
| Learning Valid Dual Bounds in Constraint Programming: Boosted Lagrangian Decomposition with Self-Supervised Learning | Aug 22, 2024 | Self-Supervised Learningvalid | —Unverified | 0 |
| Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results | Aug 21, 2024 | Image Manipulationvalid | CodeCode Available | 1 |