SOTAVerified

Benchmarking

Papers

Showing 611620 of 5548 papers

TitleStatusHype
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methodsCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
CharacterBench: Benchmarking Character Customization of Large Language ModelsCode1
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource LanguagesCode1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?Code1
Chaos as an interpretable benchmark for forecasting and data-driven modellingCode1
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic WritingCode1
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital TwinsCode1
ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and LocalizationCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
Show:102550
← PrevPage 62 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified