SOTAVerified

Benchmarking

Papers

Showing 55265548 of 5548 papers

TitleStatusHype
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse CorruptionsCode0
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation ModelsCode0
RDF-star2Vec: RDF-star Graph Embeddings for Data MiningCode0
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object UnderstandingCode0
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing EnginesCode0
Characterizing SLAM Benchmarks and Methods for the Robust Perception AgeCode0
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-JudgeCode0
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional ChineseCode0
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPsCode0
TuringQ: Benchmarking AI Comprehension in Theory of ComputationCode0
An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variationsCode0
TweetNERD -- End to End Entity Linking Benchmark for TweetsCode0
Real-time cryo-EM data pre-processing with WarpCode0
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample DatasetsCode0
TextClass Benchmark: A Continuous Elo Rating of LLMs in Social SciencesCode0
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable ConfidenceCode0
Benchmarking Abstract and Reasoning Abilities Through A Theoretical PerspectiveCode0
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State EstimationCode0
ACCORD: Closing the Commonsense Measurability GapCode0
Benchmarking 6DOF Outdoor Visual Localization in Changing ConditionsCode0
Benchmark Generation Framework with Customizable Distortions for Image Classifier RobustnessCode0
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxCode0
Show:102550
← PrevPage 222 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified