SOTAVerified

Benchmarking

Papers

Showing 43764400 of 5548 papers

TitleStatusHype
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models0
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis0
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics0
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs0
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama0
LAMBDA: Covering the Solution Set of Black-Box Inequality by Search Space Quantization0
Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification0
LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions0
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance0
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance0
Language Models for Automated Classification of Brain MRI Reports and Growth Chart Generation0
Can LLMs Capture Human Preferences?0
Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning0
Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices0
Large Language Models are Null-Shot Learners0
Large Language Models are Few-Shot Clinical Information Extractors0
Large Language Models as Automated Aligners for benchmarking Vision-Language Models0
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens0
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level0
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding0
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models0
Large-scale Benchmarking of Metaphor-based Optimization Heuristics0
Large-Scale Quantum Separability Through a Reproducible Machine Learning Lens0
Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics0
Latent Variable Models for Visual Question Answering0
Show:102550
← PrevPage 176 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified