SOTAVerified

Benchmarking

Papers

Showing 25512600 of 5548 papers

TitleStatusHype
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion ColliderCode0
Benchmarking Intersectional Biases in NLPCode0
DFEE: Interactive DataFlow Execution and Evaluation KitCode0
A Manually Annotated Image-Caption Dataset for Detecting Children in the WildCode0
Graph-theoretical approach to robust 3D normal extraction of LiDAR dataCode0
Benchmarking Commercial Intent Detection Services with Practice-Driven EvaluationsCode0
GenderBench: Evaluation Suite for Gender Biases in LLMsCode0
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in ExplanationsCode0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal DataCode0
Generalization and Regularization in DQNCode0
Arabic Speech Recognition by End-to-End, Modular Systems and HumanCode0
Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological UnderpinningsCode0
Recognizing Object Affordances to Support Scene Reasoning for Manipulation TasksCode0
Detecting critical treatment effect bias in small subgroupsCode0
From Variability to Stability: Advancing RecSys Benchmarking PracticesCode0
Benchmarking Image Perturbations for Testing Automated Driving Assistance SystemsCode0
From raw affiliations to organization identifiersCode0
Affine Non-negative Collaborative Representation Based Pattern ClassificationCode0
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual DesignCode0
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code RepositoriesCode0
Design and implementation of intelligent packet filtering in IoT microcontroller-based devicesCode0
Accurate Peak Detection in Multimodal Optimization via Approximated Landscape LearningCode0
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language RepresentationCode0
A quantum-classical reinforcement learning model to play Atari gamesCode0
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in HistopathologyCode0
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological EngineeringCode0
Dermatological Diagnosis Explainability Benchmark for Convolutional Neural NetworksCode0
Benchmarking Human and Automated Prompting in the Segment Anything ModelCode0
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning AlgorithmsCode0
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtypingCode0
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum LearningCode0
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal CarcinomaCode0
Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal OptimizationCode0
Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and BenchmarkCode0
Benchmarking Hierarchical Script KnowledgeCode0
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question AnsweringCode0
Delta-Influence: Unlearning Poisons via Influence FunctionsCode0
Forecasting time series with constraintsCode0
FHBench: Towards Efficient and Personalized Federated Learning for Multimodal HealthcareCode0
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is ComingCode0
Forecasting Future International Events: A Reliable Dataset for Text-Based Event ModelingCode0
Aesthetic Image Captioning From Weakly-Labelled PhotographsCode0
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation DifficultyCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering ApproachCode0
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN ParametersCode0
Fluorescence Reference Target Quantitative Analysis LibraryCode0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series ClassificationCode0
Show:102550
← PrevPage 52 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified