SOTAVerified

Benchmarking

Papers

Showing 10011025 of 5548 papers

TitleStatusHype
EdgeMark: An Automation and Benchmarking System for Embedded Artificial Intelligence Tools0
SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering0
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal ModelsCode1
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural NetworksCode0
True Online TD-Replan(lambda) Achieving Planning through Replaying0
Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms0
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency0
Unraveling the Capabilities of Language Models in News SummarizationCode0
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding0
The iToBoS dataset: skin region images extracted from 3D total body photographs for lesion detectionCode0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research0
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language ModelCode2
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate CampaignsCode1
Molecular-driven Foundation Model for Oncologic PathologyCode4
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection0
Making Sense of Data in the Wild: Data Analysis Automation at Scale0
A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems0
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding0
Benchmarking Quantum Reinforcement LearningCode0
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation0
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding0
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share0
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree searchCode0
Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets?0
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry0
Show:102550
← PrevPage 41 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified