SOTAVerified

Benchmarking

Papers

Showing 13511375 of 5548 papers

TitleStatusHype
Perspective on recent developments and challenges in regulatory and systems genomics0
Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet0
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries0
HourVideo: 1-Hour Video-Language UnderstandingCode2
Benchmarking Large Language Models with Integer Sequence Generation Tasks0
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking0
Beemo: Benchmark of Expert-edited Machine-generated OutputsCode0
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level0
TDDBench: A Benchmark for Training data detection0
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity DatasetCode1
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration0
On the Loss of Context-awareness in General Instruction Fine-tuningCode0
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning AgentCode3
Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive PrototypingCode2
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Imagining and building wise machines: The centrality of AI metacognition0
Benchmarking XAI Explanations with Human-Aligned Evaluations0
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph GenerationCode1
TableGPT2: A Large Multimodal Model with Tabular Data IntegrationCode4
SinaTools: Open Source Toolkit for Arabic Natural Language Processing0
ROAD-Waymo: Action Awareness at Scale for Autonomous DrivingCode1
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models0
FEET: A Framework for Evaluating Embedding TechniquesCode0
Artificial Intelligence for Microbiology and Microbiome Research0
A Review of Reinforcement Learning in Financial Applications0
Show:102550
← PrevPage 55 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified