SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2981–2990 of 5548 papers

Title	Date	Tasks	Status	Hype
Anchor Points: Benchmarking Models with Much Fewer Examples	Sep 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available	0
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations	Sep 14, 2023	BenchmarkingComputed Tomography (CT)	CodeCode Available	0
Leveraging Contextual Information for Effective Entity Salience Detection	Sep 14, 2023	ArticlesBenchmarking	—Unverified	0
Benchmarking machine learning models for quantum state classification	Sep 14, 2023	BenchmarkingClassification	—Unverified	0
VerilogEval: Evaluating Large Language Models for Verilog Code Generation	Sep 14, 2023	BenchmarkingCode Generation	CodeCode Available	2
So you think you can track?	Sep 13, 2023	BenchmarkingObject	—Unverified	0
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish	Sep 13, 2023	BenchmarkingTranslation	CodeCode Available	0
An Image Dataset for Benchmarking Recommender Systems with Raw Pixels	Sep 13, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving	Sep 12, 2023	Autonomous DrivingBenchmarking	—Unverified	0
Unveiling the potential of large language models in generating semantic and cross-language clones	Sep 12, 2023	BenchmarkingCode Generation	—Unverified	0

Show:10 25 50

← PrevPage 299 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified