SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3321–3330 of 5548 papers

Title	Date	Tasks	Status	Hype
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset	Feb 26, 2024	BenchmarkingCross-Lingual Transfer	—Unverified	0
Benchmarking LLMs on the Semantic Overlap Summarization Task	Feb 26, 2024	BenchmarkingDocument Summarization	—Unverified	0
Partial Rankings of Optimizers	Feb 26, 2024	Benchmarking	CodeCode Available	0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs	Feb 25, 2024	BenchmarkingChatbot	CodeCode Available	0
E(3)-equivariant models cannot learn chirality: Field-based molecular generation	Feb 24, 2024	BenchmarkingGraph Neural Network	—Unverified	0
Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs	Feb 24, 2024	BenchmarkingKnowledge Graphs	—Unverified	0
Benchmarking Observational Studies with Experimental Data under Right-Censoring	Feb 23, 2024	Benchmarking	—Unverified	0
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving	Feb 23, 2024	BenchmarkingDecision Making	—Unverified	0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available	0
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models	Feb 21, 2024	BenchmarkingForm	CodeCode Available	0

Show:10 25 50

← PrevPage 333 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified