SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2851–2860 of 5548 papers

Title	Date	Tasks	Status	Hype
No Dataset Needed for Downstream Knowledge Benchmarking: Response Dispersion Inversely Correlates with Accuracy on Domain-specific QA	Aug 24, 2024	BenchmarkingChatbot	—Unverified	0
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory	Aug 24, 2024	BenchmarkingData Augmentation	—Unverified	0
Open Llama2 Model for the Lithuanian Language	Aug 23, 2024	Benchmarkingmodel	—Unverified	0
Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection	Aug 23, 2024	BenchmarkingBinary Classification	—Unverified	0
S3Simulator: A benchmarking Side Scan Sonar Simulator dataset for Underwater Image Analysis	Aug 23, 2024	Benchmarking	CodeCode Available	0
Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures	Aug 22, 2024	BenchmarkingTrajectory Prediction	—Unverified	0
Benchmarking Counterfactual Interpretability in Deep Learning Models for Time Series Classification	Aug 22, 2024	Benchmarkingcounterfactual	—Unverified	0
WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentation	Aug 22, 2024	BenchmarkingClassification	CodeCode Available	0
MultiMed: Massively Multimodal and Multitask Medical Understanding	Aug 22, 2024	BenchmarkingMedical Question Answering	—Unverified	0
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis	Aug 22, 2024	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 286 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified