SOTAVerified

Benchmarking

Papers

Showing 23512375 of 5548 papers

TitleStatusHype
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings0
Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series ClassificationCode0
Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI0
The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways0
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles0
Stronger Than You Think: Benchmarking Weak Supervision on Realistic TasksCode0
Lessons From Red Teaming 100 Generative AI Products0
Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure0
Evidential Deep Learning for Uncertainty Quantification and Out-of-Distribution Detection in Jet Identification using Deep Neural NetworksCode0
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition0
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation0
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning0
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing0
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI0
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization0
An Analysis of Model Robustness across Concurrent Distribution Shifts0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding0
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices0
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input0
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models0
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence BenchmarksCode0
Show:102550
← PrevPage 95 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified