SOTAVerified

Benchmarking

Papers

Showing 38763900 of 5548 papers

TitleStatusHype
Comparing Foundation Models using Data Kernels0
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A SurveyCode0
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness0
Semantic Segmentation using Vision Transformers: A survey0
Can LLMs Capture Human Preferences?0
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view0
A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images0
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications0
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task0
On Pitfalls of RemOve-And-Retrain: Data Processing Inequality PerspectiveCode0
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency0
CIMLA: Interpretable AI for inference of differential causal networks0
Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints0
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation OncologyCode0
A Framework for Benchmarking Real-Time Embedded Object Detection0
Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification0
Learning a quantum computer's capability0
Towards a Benchmark for Scientific Understanding in Humans and Machines0
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning AlgorithmsCode0
The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource LanguagesCode0
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite0
Computational and Exploratory Landscape Analysis of the GKLS Generator0
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance AnalysisCode0
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy0
Show:102550
← PrevPage 156 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified