SOTAVerified

Benchmarking

Papers

Showing 17261750 of 5548 papers

TitleStatusHype
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection0
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE DatasetCode0
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation0
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation0
LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning ChallengesCode0
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models0
A Position Paper on the Automatic Generation of Machine Learning LeaderboardsCode0
SEvoBench : A C++ Framework For Evolutionary Single-Objective Optimization Benchmarking0
Wildfire spread forecasting with Deep LearningCode0
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language0
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts0
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond ClassificationCode0
Benchmark for Antibody Binding Affinity Maturation and Design0
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation0
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding0
3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method EvaluationCode0
Is Single-View Mesh Reconstruction Ready for Robotics?0
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language ModelsCode0
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS20
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor0
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models0
Edge-First Language Model Inference: Models, Metrics, and Tradeoffs0
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models0
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing0
Show:102550
← PrevPage 70 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified