SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4461–4470 of 5548 papers

Title	Date	Tasks	Status	Hype
Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial Observations	Jun 12, 2024	BenchmarkingDeep Reinforcement Learning	CodeCode Available	0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs	Apr 10, 2024	Benchmarkingknowledge editing	CodeCode Available	0
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms	Oct 16, 2019	Bayesian InferenceBenchmarking	CodeCode Available	0
An Auditing Test To Detect Behavioral Shift in Language Models	Oct 25, 2024	BenchmarkingChange Detection	CodeCode Available	0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods	Apr 29, 2024	BenchmarkingDrug Discovery	CodeCode Available	0
Learnability and Complexity of Quantum Samples	Oct 22, 2020	Benchmarking	CodeCode Available	0
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks	Feb 2, 2025	Benchmarking	CodeCode Available	0
Learned Sorted Table Search and Static Indexes in Small Model Space	Jul 19, 2021	BenchmarkingOpen-Ended Question Answering	CodeCode Available	0
Learn How to Query from Unlabeled Data Streams in Federated Learning	Dec 11, 2024	BenchmarkingDecision Making	CodeCode Available	0
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration	Jul 1, 2024	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 447 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified