SOTAVerified

Benchmarking

Papers

Showing 19111920 of 5548 papers

TitleStatusHype
Hydra: Marker-Free RGB-D Hand-Eye Calibration0
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks0
SecRepoBench: Benchmarking LLMs for Secure Code Generation in Real-World Repositories0
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language ModelsCode0
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking0
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model ValidationCode0
The Leaderboard Illusion0
LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs0
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets0
BLADE: Benchmark suite for LLM-driven Automated Design and Evolution of iterative optimisation heuristics0
Show:102550
← PrevPage 192 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified