SOTAVerified

valid

Papers

Showing 201250 of 3589 papers

TitleStatusHype
Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation0
Universality of conformal prediction under the assumption of randomness0
Overcoming Dependent Censoring in the Evaluation of Survival ModelsCode0
Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation0
Shh, don't say that! Domain Certification in LLMs0
Uncertainty Quantification for LLM-Based Survey Simulations0
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model GeneralizationCode0
Data-Driven Input-Output Control Barrier Functions0
Quantifying Logical Consistency in Transformers via Query-Key Alignment0
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction0
Your Assumed DAG is Wrong and Here's How To Deal With ItCode0
Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs0
Pricing Valid Cuts for Price-Match Equilibria0
EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization FormulationsCode0
Towards a Perspectivist Turn in Argument Quality AssessmentCode0
Explainable Distributed Constraint Optimization Problems0
Conformal Prediction under Levy-Prokhorov Distribution Shifts: Robustness to Local and Global PerturbationsCode0
Generalization error bound for denoising score matching under relaxed manifold assumption0
What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis0
Likelihood-Ratio Regularized Quantile Regression: Adapting Conformal Prediction to High-Dimensional Covariate Shifts0
GiFT: Gibbs Fine-Tuning for Code GenerationCode0
Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs0
The Relationship between No-Regret Learning and Online Conformal Prediction0
A new and flexible class of sharp asymptotic time-uniform confidence sequences0
Self-Normalized Inference in (Quantile, Expected Shortfall) Regressions for Time Series0
Multi-Objective Planning with Contextual Lexicographic Reward Preferences0
Trust Me, I Know the Way: Predictive Uncertainty in the Presence of Shortcut Learning0
Generalizability through Explainability: Countering Overfitting with Counterfactual Examples0
CRANE: Reasoning with constrained LLM generation0
High-Throughput SAT SamplingCode0
Inference in dynamic models for panel data using the moving block bootstrap0
On Training-Conditional Conformal Prediction and Binomial Proportion Confidence Intervals0
Beyond Confidence: Adaptive Abstention in Dual-Threshold Conformal Prediction for Autonomous System PerceptionCode0
Experiments in the Linear Convex Order0
Krum Federated Chain (KFC): Using blockchain to defend against adversarial attacks in Federated LearningCode0
Dual Conic Proxy for Semidefinite Relaxation of AC Optimal Power Flow0
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark0
On the Impact of the Utility in Semivalue-based Data Valuation0
Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial AssociationCode0
Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests0
Generative-enhanced optimization for knapsack problems: an industry-relevant study0
t-Testing the Waters: Empirically Validating Assumptions for Reliable A/B-Testing0
Automating a Complete Software Test Process Using LLMs: An Automotive Case Study0
Combining Clusters for the Approximate Randomization Test0
First-ish Order Methods: Hessian-aware Scalings of Gradient Descent0
Efficient Randomized Experiments Using Foundation ModelsCode0
Change Point Detection in the Frequency Domain with Statistical Reliability0
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion ModelsCode1
FAB-PPI: Frequentist, Assisted by Bayes, Prediction-Powered Inference0
Variance-Adjusted Cosine Distance as Similarity Metric0
Show:102550
← PrevPage 5 of 72Next →

No leaderboard results yet.