SOTAVerified

Benchmarking

Papers

Showing 22512275 of 5548 papers

TitleStatusHype
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban IntersectionCode1
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual ComprehensionCode3
Benchmarking Mobile Device Control Agents across Diverse Configurations0
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey beesCode0
SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic DataCode1
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value ExtractionCode1
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning0
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
Open Datasets for Satellite Radio Resource Control0
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches0
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave ImagingCode1
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking0
A User-Centric Multi-Intent Benchmark for Evaluating Large Language ModelsCode1
EnzChemRED, a rich enzyme chemistry relation extraction dataset0
TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos0
TAVGBench: Benchmarking Text to Audible-Video GenerationCode1
In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review0
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real NewsCode0
Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization0
DeepFake-O-Meter v2.0: An Open Platform for DeepFake DetectionCode3
Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection0
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge BasesCode3
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning0
REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity LinkingCode1
Show:102550
← PrevPage 91 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified