SOTAVerified

Benchmarking

Papers

Showing 451475 of 5548 papers

TitleStatusHype
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
AD-LLM: Benchmarking Large Language Models for Anomaly DetectionCode1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional DependenciesCode1
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning PotentialsCode1
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRTCode1
An Exploration of Embodied Visual ExplorationCode1
Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking PlatformCode1
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope PredictionCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic WritingCode1
CharacterBench: Benchmarking Character Customization of Large Language ModelsCode1
CCTV-Gun: Benchmarking Handgun Detection in CCTV ImagesCode1
AnomalyHop: An SSL-based Image Anomaly Localization MethodCode1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?Code1
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital TwinsCode1
CausalTime: Realistically Generated Time-series for Benchmarking of Causal DiscoveryCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
CodeS: Natural Language to Code Repository via Multi-Layer SketchCode1
CodeUpdateArena: Benchmarking Knowledge Editing on API UpdatesCode1
Chaos as an interpretable benchmark for forecasting and data-driven modellingCode1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIsCode1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report LabelingCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial LabelsCode1
Show:102550
← PrevPage 19 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified