SOTAVerified

Benchmarking

Papers

Showing 54015450 of 5548 papers

TitleStatusHype
Probing Acoustic Representations for Phonetic PropertiesCode0
Probing Conceptual Understanding of Large Visual-Language ModelsCode0
Probing Critical Learning Dynamics of PLMs for Hate Speech DetectionCode0
Using Color To Identify Insider ThreatsCode0
An Exploration of Exploration: Measuring the ability of lexicase selection to find obscure pathways to optimalityCode0
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance AnalysisCode0
Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of DatasetsCode0
Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithmsCode0
Process Extraction from Text: Benchmarking the State of the Art and Paving the Way for Future ChallengesCode0
Transfer Learning for Prosthetics Using Imitation LearningCode0
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternativesCode0
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEsCode0
Synthetic location trajectory generation using categorical diffusion modelsCode0
Synthetic Porous Microstructures: Automatic Design, Simulation, and Permeability AnalysisCode0
Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation BenchmarksCode0
An Experimental Study of the Transferability of Spectral Graph NetworksCode0
Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition TaskCode0
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science CommunicatorsCode0
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated LearningCode0
Comparing Machine Learning Algorithms by Union-Free Generic DepthCode0
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckListsCode0
Transformation-Interaction-Rational Representation for Symbolic RegressionCode0
Towards Enhancing Fault Tolerance in Neural NetworksCode0
Robust Model-Based Optimization for Challenging Fitness LandscapesCode0
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual NavigationCode0
Transformers for Green Semantic Communication: Less Energy, More SemanticsCode0
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum ChemistryCode0
ViP: Video Platform for PyTorchCode0
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto LanguageCode0
Comparative Study Between Distance Measures On Supervised Optimum-Path Forest ClassificationCode0
Towards Efficient Synchronous Federated Training: A Survey on System Optimization StrategiesCode0
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control TasksCode0
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and EvaluationCode0
Comparative Analysis: Violence Recognition from Videos using Transfer LearningCode0
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical StudyCode0
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysisCode0
Compact Trilinear Interaction for Visual Question AnsweringCode0
Benchmarking Classic and Learned Navigation in Complex 3D EnvironmentsCode0
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic DataCode0
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation ModelsCode0
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language ModelsCode0
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and AssistanceCode0
CODES: Benchmarking Coupled ODE SurrogatesCode0
CodeS: Towards Code Model Generalization Under Distribution ShiftCode0
Code Ownership in Open-Source AI Software SecurityCode0
Benchmarking ChatGPT on Algorithmic ReasoningCode0
COCO: Performance AssessmentCode0
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)Code0
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation OncologyCode0
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMsCode0
Show:102550
← PrevPage 109 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified