SOTAVerified

Benchmarking

Papers

Showing 12711280 of 5548 papers

TitleStatusHype
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)Code0
TextClass Benchmark: A Continuous Elo Rating of LLMs in Social SciencesCode0
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learningCode1
One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering0
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OasisCode1
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark0
OpenQDC: Open Quantum Data CommonsCode2
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics0
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial TasksCode2
Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks0
Show:102550
← PrevPage 128 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified