SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 841–850 of 5548 papers

Title	Date	Tasks	Status	Hype
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs	Nov 29, 2023	Benchmarking	CodeCode Available	1
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation	Nov 26, 2023	BenchmarkingHallucination	CodeCode Available	1
Benchmarking Robustness of Text-Image Composed Retrieval	Nov 24, 2023	AttributeBenchmarking	CodeCode Available	1
IMGTB: A Framework for Machine-Generated Text Detection Benchmarking	Nov 21, 2023	BenchmarkingText Detection	CodeCode Available	1
BEND: Benchmarking DNA Language Models on biologically meaningful tasks	Nov 21, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
Towards a more inductive world for drug repurposing approaches	Nov 21, 2023	BenchmarkingPrediction	CodeCode Available	1
LogLead -- Fast and Integrated Log Loader, Enhancer, and Anomaly Detector	Nov 20, 2023	Anomaly DetectionBenchmarking	CodeCode Available	1
Benchmarking Pathology Feature Extractors for Whole Slide Image Classification	Nov 20, 2023	Benchmarkingimage-classification	CodeCode Available	1
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction	Nov 16, 2023	BenchmarkingEvent Extraction	CodeCode Available	1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1

Show:10 25 50

← PrevPage 85 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified