SOTAVerified

Benchmarking

Papers

Showing 41514175 of 5548 papers

TitleStatusHype
The Unconstrained Ear Recognition Challenge0
The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix0
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models0
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?0
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection0
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time0
Time Sensitive Knowledge Editing through Efficient Finetuning0
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs0
Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines0
Timing Excess Returns A cross-universe approach to alpha0
TinyML Platforms Benchmarking0
Title2Event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset0
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking0
tmVar 3.0: an improved variant concept recognition and normalization tool0
Token Sequence Compression for Efficient Multimodal Computing0
Top-k Regularization for Supervised Feature Selection0
Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection0
Totally Corrective Boosting with Cardinality Penalization0
TOTOPO: Classifying univariate and multivariate time series with Topological Data Analysis0
Toward an ImageNet Library of Functions for Global Optimization Benchmarking0
Toward end-to-end interpretable convolutional neural networks for waveform signals0
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage0
Towards a Benchmark for Scientific Understanding in Humans and Machines0
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving0
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems0
Show:102550
← PrevPage 167 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified