SOTAVerified

Benchmarking

Papers

Showing 751800 of 5548 papers

TitleStatusHype
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic MaterialsCode1
Large Scale MRI Collection and Segmentation of Cirrhotic LiverCode1
BeHonest: Benchmarking Honesty in Large Language ModelsCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
AdaPool: Exponential Adaptive Pooling for Information-Retaining DownsamplingCode1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for ElectromyographyCode1
Geometric Deep Learning for Structure-Based Drug Design: A SurveyCode1
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial AttacksCode1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learningCode1
A multi-schematic classifier-independent oversampling approach for imbalanced datasetsCode1
End-to-end Knowledge Retrieval with Multi-modal QueriesCode1
Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsersCode1
Bencher: Simple and Reproducible Benchmarking for Black-Box OptimizationCode1
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person PerspectiveCode1
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image AnalysisCode1
Anabranch Network for Camouflaged Object SegmentationCode1
Evaluating Attribution for Graph Neural NetworksCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
Evaluating Multimodal Representations on Visual Semantic Textual SimilarityCode1
Evaluation of large language models for discovery of gene set functionCode1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning RobustnessCode1
Benchmarking deep inverse models over time, and the neural-adjoint methodCode1
A Comprehensive Overview of Large Language ModelsCode1
Examining the Effects of Degree Distribution and Homophily in Graph Learning ModelsCode1
Leveraging Trust for Joint Multi-Objective and Multi-Fidelity OptimizationCode1
Analog or Digital In-memory Computing? Benchmarking through Quantitative ModelingCode1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Code1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine LearningCode1
Exploring Large Language Models for Classical PhilologyCode1
CIDEr: Consensus-based Image Description EvaluationCode1
AirSim Drone Racing LabCode1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMsCode1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action ConstraintsCode1
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionCode1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning AlgorithmsCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
featsel: A framework for benchmarking of feature selection algorithms and cost functionsCode1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization CorrelationsCode1
Benchmarking Adversarial Patch Against Aerial DetectionCode1
Benchmarking Data Science AgentsCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report LabelingCode1
Benchmarking Adversarial Robustness on Image ClassificationCode1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methodsCode1
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Show:102550
← PrevPage 16 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified