SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2461–2470 of 5548 papers

Title	Date	Tasks	Status	Hype
A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images	Feb 27, 2024	BenchmarkingDefect Detection	—Unverified	0
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns	Feb 27, 2024	BenchmarkingBinary Classification	CodeCode Available	0
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data	Feb 27, 2024	Benchmarking	CodeCode Available	1
Partial Rankings of Optimizers	Feb 26, 2024	Benchmarking	CodeCode Available	0
Benchmarking LLMs on the Semantic Overlap Summarization Task	Feb 26, 2024	BenchmarkingDocument Summarization	—Unverified	0
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset	Feb 26, 2024	BenchmarkingCross-Lingual Transfer	—Unverified	0
Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems	Feb 26, 2024	BenchmarkingEvolutionary Algorithms	—Unverified	0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs	Feb 25, 2024	BenchmarkingChatbot	CodeCode Available	0
PST-Bench: Tracing and Benchmarking the Source of Publications	Feb 25, 2024	Benchmarking	CodeCode Available	1
Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs	Feb 24, 2024	BenchmarkingKnowledge Graphs	—Unverified	0

Show:10 25 50

← PrevPage 247 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified