| Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs | Jun 11, 2025 | Dependency ParsingHallucination | CodeCode Available | 0 |
| Policy-Based Trajectory Clustering in Offline Reinforcement Learning | Jun 10, 2025 | ClusteringD4RL | —Unverified | 0 |
| Employing self-supervised learning models for cross-linguistic child speech maturity classification | Jun 10, 2025 | Self-Supervised Learningvalid | CodeCode Available | 0 |
| Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification | Jun 10, 2025 | imbalanced classificationvalid | —Unverified | 0 |
| Language Models over Canonical Byte-Pair Encodings | Jun 9, 2025 | valid | —Unverified | 0 |
| PhysiInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation | Jun 9, 2025 | Motion Generationvalid | —Unverified | 0 |
| Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework | Jun 9, 2025 | BenchmarkingFairness | —Unverified | 0 |
| AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists | Jun 9, 2025 | scientific discoveryvalid | —Unverified | 0 |
| Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems | Jun 7, 2025 | Code Generationvalid | —Unverified | 0 |
| Inference on the value of a linear program | Jun 7, 2025 | valid | —Unverified | 0 |
| On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization | Jun 6, 2025 | regressionvalid | CodeCode Available | 0 |
| Speech Neurophysiology in Realistic Contexts: Big Hype or Big Leap? | Jun 5, 2025 | valid | —Unverified | 0 |
| Does It Make Sense to Speak of Introspection in Large Language Models? | Jun 5, 2025 | valid | —Unverified | 0 |
| DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation | Jun 4, 2025 | Dialogue Evaluationvalid | —Unverified | 0 |
| DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience | Jun 4, 2025 | Efficient ExplorationEquation Discovery | —Unverified | 0 |
| SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL | Jun 4, 2025 | Text to SQLText-To-SQL | —Unverified | 0 |
| Pi-SQL: Enhancing Text-to-SQL with Fine-Grained Guidance from Pivot Programming Languages | Jun 1, 2025 | Text to SQLText-To-SQL | —Unverified | 0 |
| Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Models for Method Generation | Jun 1, 2025 | Model SelectionPrompt Engineering | CodeCode Available | 0 |
| Quantization-based Bounds on the Wasserstein Metric | Jun 1, 2025 | Computational EfficiencyDomain Adaptation | —Unverified | 0 |
| Clinical Annotations for Automatic Stuttering Severity Assessment | May 31, 2025 | valid | CodeCode Available | 0 |
| CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs | May 30, 2025 | DiagnosticImage Comprehension | —Unverified | 0 |
| Conformal Object Detection by Sequential Risk Control | May 29, 2025 | Conformal PredictionObject | —Unverified | 0 |
| Generalizability vs. Counterfactual Explainability Trade-Off | May 29, 2025 | counterfactualvalid | —Unverified | 0 |
| Maximum Likelihood Learning of Latent Dynamics Without Reconstruction | May 29, 2025 | Schedulingvalid | —Unverified | 0 |
| Stable Thompson Sampling: Valid Inference via Variance Inflation | May 29, 2025 | Decision MakingThompson Sampling | —Unverified | 0 |