SOTAVerified

Benchmarking

Papers

Showing 22012225 of 5548 papers

TitleStatusHype
Benchmarking Online Object Trackers for Underwater Robot Position Locking Applications0
VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs0
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language ModelsCode0
Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation0
Methods and Trends in Detecting Generated Images: A Comprehensive Review0
MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models0
Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained modelsCode0
Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis0
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide0
Synthetic Porous Microstructures: Automatic Design, Simulation, and Permeability AnalysisCode0
Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models0
Sentence Smith: Formally Controllable Text Transformation and its Application to Evaluation of Text Embedding Models0
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems0
PredictaBoard: Benchmarking LLM Score PredictabilityCode0
Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse0
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation FrameworkCode0
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks0
Statistical Scenario Modelling and Lookalike Distributions for Multi-Variate AI Risk0
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking0
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior0
Benchmarking Self-Supervised Learning Methods for Accelerated MRI ReconstructionCode0
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare0
Position: There are no Champions in Long-Term Time Series Forecasting0
Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking0
Show:102550
← PrevPage 89 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified