SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2511–2520 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Dialogue Quality and Emotion Annotations for Customer Support Conversations	Nov 23, 2023	BenchmarkingDiversity	CodeCode Available	0	5
From raw affiliations to organization identifiers	May 12, 2025	BenchmarkingMetadata quality	CodeCode Available	0	5
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering	May 11, 2025	BenchmarkingGeneral Knowledge	CodeCode Available	0	5
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning	Mar 16, 2023	BenchmarkingContinual Learning	CodeCode Available	0	5
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation	Apr 14, 2024	BenchmarkingDiversity	CodeCode Available	0	5
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology	Apr 11, 2022	BenchmarkingCancer Classification	CodeCode Available	0	5
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories	Apr 23, 2025	Benchmarking	CodeCode Available	0	5
Benchmarking pre-trained text embedding models in aligning built asset information	Nov 18, 2024	Asset ManagementBenchmarking	CodeCode Available	0	5
Benchmarking Intersectional Biases in NLP	Jul 1, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0	5
DFEE: Interactive DataFlow Execution and Evaluation Kit	Dec 4, 2022	BenchmarkingScheduling	CodeCode Available	0	5

Show:10 25 50

← PrevPage 252 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified