SOTAVerified

Benchmarking

Papers

Showing 43014350 of 5548 papers

TitleStatusHype
VATr++: Choose Your Words Wisely for Handwritten Text Generation0
Vec2Face: Unveil Human Faces from their Blackbox Features in Face Recognition0
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment0
VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination0
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts0
Verifiable Format Control for Large Language Model Generations0
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity0
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models0
VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution0
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations0
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset0
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos0
VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs0
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground0
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data0
VIPPrint: A Large Scale Dataset of Printed and Scanned Images for Synthetic Face Images Detection and Source Linking0
Virus-MNIST: Machine Learning Baseline Calculations for Image Classification0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning0
VisImages: A Fine-Grained Expert-Annotated Visualization Dataset0
WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs0
Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information0
Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircraft0
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph0
Vision Learners Meet Web Image-Text Pairs0
Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification0
Visual Attention on the Sun: What Do Existing Models Actually Predict?0
Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding0
Visual Object Tracking on Multi-modal RGB-D Videos: A Review0
Visual Place Recognition for Large-Scale UAV Applications0
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare0
VoiceWukong: Benchmarking Deepfake Voice Detection0
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning0
v-SVR Polynomial Kernel for Predicting the Defect Density in New Software Projects0
Vulnerability of Face Morphing Attacks: A Case Study on Lookalike and Identical Twins0
From Attack to Protection: Leveraging Watermarking Attack Network for Advanced Add-on Watermarking0
Ward: Provable RAG Dataset Inference via LLM Watermarks0
Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation0
WebVision Challenge: Visual Learning and Understanding With Web Data0
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking0
WER We Stand: Benchmarking Urdu ASR Models0
What can 5.17 billion regression fits tell us about artificial models of the human visual system?0
What cleaves? Is proteasomal cleavage prediction reaching a ceiling?0
What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs0
What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI0
What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus0
Alexpaca: Learning Factual Clarification Question Generation Without Examples0
What Motivates You? Benchmarking Automatic Detection of Basic Needs from Short Posts0
Towards Self-adaptive Mutation in Evolutionary Multi-Objective Algorithms0
What Will it Take to Fix Benchmarking in Natural Language Understanding?0
Show:102550
← PrevPage 87 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified