SOTAVerified

Benchmarking

Papers

Showing 27012725 of 5548 papers

TitleStatusHype
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations0
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training0
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks0
Hybrid data driven/thermal simulation model for comfort assessment0
GANmut: Generating and Modifying Facial Expressions0
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR0
FactLens: Benchmarking Fine-Grained Fact Verification0
GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics0
FACT: Learning Governing Abstractions Behind Integer Sequences0
Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy0
Face Morphing Attack Generation & Detection: A Comprehensive Survey0
Face Detection on Surveillance Images0
A Survey of Small Language Models0
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study0
Hydrological time series forecasting using simple combinations: Big data testing and investigations on one-year ahead river flow predictability0
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning0
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content0
A Unified Taylor Framework for Revisiting Attribution Methods0
Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing0
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases0
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis0
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression0
Extraction of clinical information from the non-invasive fetal electrocardiogram0
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Show:102550
← PrevPage 109 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified