SOTAVerified

Benchmarking

Papers

Showing 41514200 of 5548 papers

TitleStatusHype
The Unconstrained Ear Recognition Challenge0
The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix0
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models0
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?0
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection0
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time0
Time Sensitive Knowledge Editing through Efficient Finetuning0
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs0
Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines0
Timing Excess Returns A cross-universe approach to alpha0
TinyML Platforms Benchmarking0
Title2Event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset0
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking0
tmVar 3.0: an improved variant concept recognition and normalization tool0
Token Sequence Compression for Efficient Multimodal Computing0
Top-k Regularization for Supervised Feature Selection0
Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection0
Totally Corrective Boosting with Cardinality Penalization0
TOTOPO: Classifying univariate and multivariate time series with Topological Data Analysis0
Toward an ImageNet Library of Functions for Global Optimization Benchmarking0
Toward end-to-end interpretable convolutional neural networks for waveform signals0
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage0
Towards a Benchmark for Scientific Understanding in Humans and Machines0
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving0
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems0
Towards an AI Accountability Policy0
Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations0
Towards a Taxonomy of Graph Learning Datasets0
Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling (1+λ) EA Variants on OneMax and LeadingOnes0
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins0
Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios0
Towards Benchmarking and Evaluating Deepfake Detection0
Towards Benchmarking Explainable Artificial Intelligence Methods0
Towards Benchmarking Scene Background Initialization0
Towards Benchmarking the Utility of Explanations for Model Debugging0
Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds0
Towards Effective Disambiguation for Machine Translation with Large Language Models0
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques0
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset0
Towards Explainable Network Intrusion Detection using Large Language Models0
Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking0
Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings0
Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours0
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models0
Towards Large-Scale Small Object Detection: Survey and Benchmarks0
Towards Long-Term predictions of Turbulence using Neural Operators0
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks0
Towards Personalized Federated Learning0
Towards Private Learning on Decentralized Graphs with Local Differential Privacy0
Towards Productionizing Subjective Search Systems0
Show:102550
← PrevPage 84 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified