SOTAVerified

Benchmarking

Papers

Showing 21512200 of 5548 papers

TitleStatusHype
Fairness Index Measures to Evaluate Bias in Biometric Recognition0
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning0
A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect0
Benchmarking Active Learning Strategies for Materials Optimization and Discovery0
Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos0
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms0
Benchmarking Active Learning for NILM0
Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation0
Analysing Features Learned Using Unsupervised Models on Program Embeddings0
Fairness-Aware Graph Neural Networks: A Survey0
Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension0
FastEnsemble: Benchmarking and Accelerating Ensemble-based Uncertainty Estimation for Image-to-Image Translation0
Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data0
Analysing Errors of Open Information Extraction Systems0
Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization0
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking0
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles0
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System0
Benchmarking a Benchmark: How Reliable is MS-COCO?0
A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management0
Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents0
A new pathway to generative artificial intelligence by minimizing the maximum entropy0
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations0
FAIRification of MLC data0
BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions0
Adaptive Gradient Methods with Local Guarantees0
Object Pose Estimation in Robotics Revisited0
BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization0
Scale MLPerf-0.6 models on Google TPU-v3 Pods0
FACT: Learning Governing Abstractions Behind Integer Sequences0
Boundary Detection Benchmarking: Beyond F-Measures0
BoTTA: Benchmarking on-device Test Time Adaptation0
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction0
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization0
Benchmarking 3D Human Pose Estimation Models Under Occlusions0
An AI based talent acquisition and benchmarking for job0
FactLens: Benchmarking Fine-Grained Fact Verification0
BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models0
Benchmarking 2D Egocentric Hand Pose Datasets0
BongLLaMA: LLaMA for Bangla Language0
An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model0
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches0
Benchmark for Antibody Binding Affinity Maturation and Design0
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content0
BOLD: Boolean Logic Deep Learning0
An Accelerated Correlation Filter Tracker0
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning0
Face Detection on Surveillance Images0
Show:102550
← PrevPage 44 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified