| Statistically Valid Information Bottleneck via Multiple Hypothesis Testing | Sep 11, 2024 | valid | —Unverified | 0 |
| Improving Conditional Level Generation using Automated Validation in Match-3 Games | Sep 10, 2024 | valid | —Unverified | 0 |
| NSP: A Neuro-Symbolic Natural Language Navigational Planner | Sep 10, 2024 | valid | —Unverified | 0 |
| Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences | Sep 10, 2024 | Contrastive Learningvalid | CodeCode Available | 0 |
| Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement | Sep 10, 2024 | Multiple-choiceSentence | —Unverified | 0 |
| The Surprising Robustness of Partial Least Squares | Sep 9, 2024 | Dimensionality Reductionvalid | —Unverified | 0 |
| Inference for Large Scale Regression Models with Dependent Errors | Sep 8, 2024 | Gaussian Processesparameter estimation | —Unverified | 0 |
| Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models | Sep 7, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| Leveraging Machine Learning for Official Statistics: A Statistical Manifesto | Sep 6, 2024 | Surveyvalid | —Unverified | 0 |
| FuzzCoder: Byte-level Fuzzing Test via Large Language Model | Sep 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Federated Prediction-Powered Inference from Decentralized Data | Sep 3, 2024 | Federated LearningPrediction | CodeCode Available | 0 |
| An essay on the history of DSGE models | Sep 1, 2024 | valid | —Unverified | 0 |
| Stochastic Monotonicity and Random Utility Models: The Good and The Ugly | Sep 1, 2024 | valid | —Unverified | 0 |
| "Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration | Aug 30, 2024 | Embodied Question AnsweringQuestion Answering | —Unverified | 0 |
| The creative psychometric item generator: a framework for item generation and validation using large language models | Aug 30, 2024 | valid | —Unverified | 0 |
| Continual learning with the neural tangent ensemble | Aug 30, 2024 | Continual Learningvalid | —Unverified | 0 |
| Self-supervised learning for crystal property prediction via denoising | Aug 30, 2024 | DenoisingPrediction | —Unverified | 0 |
| Can Unconfident LLM Annotations Be Used for Confident Conclusions? | Aug 27, 2024 | valid | CodeCode Available | 1 |
| Double/Debiased CoCoLASSO of Treatment Effects with Mismeasured High-Dimensional Control Variables | Aug 26, 2024 | Econometricsvalid | —Unverified | 0 |
| EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory | Aug 26, 2024 | Decision MakingDiversity | —Unverified | 0 |
| Investigating the effect of Mental Models in User Interaction with an Adaptive Dialog Agent | Aug 26, 2024 | valid | —Unverified | 0 |
| RoCP-GNN: Robust Conformal Prediction for Graph Neural Networks in Node-Classification | Aug 25, 2024 | Conformal PredictionLink Prediction | —Unverified | 0 |
| Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Learning Valid Dual Bounds in Constraint Programming: Boosted Lagrangian Decomposition with Self-Supervised Learning | Aug 22, 2024 | Self-Supervised Learningvalid | —Unverified | 0 |
| AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results | Aug 21, 2024 | Image Manipulationvalid | CodeCode Available | 1 |
| Learning Deep Dissipative Dynamics | Aug 21, 2024 | LEMMATime Series | CodeCode Available | 0 |
| A Markovian Model for Learning-to-Optimize | Aug 21, 2024 | Generalization Boundsmodel | —Unverified | 0 |
| Optical ISAC: Fundamental Performance Limits and Transceiver Design | Aug 21, 2024 | Integrated sensing and communicationISAC | —Unverified | 0 |
| Safety-Critical Stabilization of Force-Controlled Nonholonomic Mobile Robots | Aug 20, 2024 | Autonomous Vehiclesvalid | —Unverified | 0 |
| Inference with Many Weak Instruments and Heterogeneity | Aug 20, 2024 | valid | —Unverified | 0 |
| Conformalized Interval Arithmetic with Symmetric Calibration | Aug 20, 2024 | Conformal PredictionDecision Making | CodeCode Available | 0 |
| On Learning Action Costs from Input Plans | Aug 20, 2024 | valid | —Unverified | 0 |
| BLADE: Benchmarking Language Model Agents for Data-Driven Science | Aug 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Physics-Aware Combinatorial Assembly Sequence Planning using Data-free Action Masking | Aug 19, 2024 | Deep Reinforcement LearningObject | CodeCode Available | 0 |
| Uncertainty Quantification of Surrogate Models using Conformal Prediction | Aug 19, 2024 | Conformal PredictionPrediction | CodeCode Available | 1 |
| Importance Weighting Can Help Large Language Models Self-Improve | Aug 19, 2024 | Language Modellingvalid | CodeCode Available | 0 |
| Data-driven Conditional Instrumental Variables for Debiasing Recommender Systems | Aug 19, 2024 | Recommendation Systemsvalid | —Unverified | 0 |
| Generating Automatically Print/Scan Textures for Morphing Attack Detection Applications | Aug 18, 2024 | Diversityvalid | CodeCode Available | 0 |
| Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters | Aug 18, 2024 | valid | —Unverified | 0 |
| GraphSPNs: Sum-Product Networks Benefit From Canonical Orderings | Aug 18, 2024 | Molecular Graph Generationvalid | CodeCode Available | 0 |
| Externally Valid Selection of Experimental Sites via the k-Median Problem | Aug 17, 2024 | valid | —Unverified | 0 |
| A Confidence Interval for the _2 Expected Calibration Error | Aug 16, 2024 | valid | CodeCode Available | 0 |
| A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models | Aug 16, 2024 | Logical Reasoningvalid | —Unverified | 0 |
| An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem | Aug 16, 2024 | Combinatorial Optimizationvalid | CodeCode Available | 0 |
| Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models | Aug 15, 2024 | Adversarial AttackLanguage Modeling | CodeCode Available | 0 |
| QirK: Question Answering via Intermediate Representation on Knowledge Graphs | Aug 14, 2024 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Defining and Measuring Disentanglement for non-Independent Factors of Variation | Aug 13, 2024 | DisentanglementRepresentation Learning | —Unverified | 0 |
| Design Proteins Using Large Language Models: Enhancements and Comparative Analyses | Aug 12, 2024 | valid | CodeCode Available | 0 |
| Approximating Discrimination Within Models When Faced With Several Non-Binary Sensitive Attributes | Aug 12, 2024 | AttributeFairness | CodeCode Available | 0 |
| People over trust AI-generated medical responses and view them to be as valid as doctors, despite low accuracy | Aug 11, 2024 | Large Language Modelvalid | —Unverified | 0 |