Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4151–4200 of 5548 papers

Title	Date	Tasks	Status
The Unconstrained Ear Recognition Challenge	Aug 23, 2017	BenchmarkingPerson Recognition	—Unverified
The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix	Mar 11, 2019	BenchmarkingPerson Recognition	—Unverified
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models	Apr 17, 2025	BenchmarkingMath	—Unverified
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?	Jun 2, 2025	BenchmarkingInstruction Following	—Unverified
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection	Sep 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time	Sep 20, 2024	BenchmarkingWorld Knowledge	—Unverified
Time Sensitive Knowledge Editing through Efficient Finetuning	Jun 6, 2024	Benchmarkingknowledge editing	—Unverified
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs	Mar 13, 2025	BenchmarkingQuestion Answering	—Unverified
Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines	Feb 21, 2023	Benchmarkingwhole slide images	—Unverified
Timing Excess Returns A cross-universe approach to alpha	Feb 11, 2020	BenchmarkingTime Series	—Unverified
TinyML Platforms Benchmarking	Nov 30, 2021	Benchmarking	—Unverified
Title2Event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset	Nov 2, 2022	BenchmarkingEvent Extraction	—Unverified
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking	Feb 16, 2025	Benchmarking	—Unverified
tmVar 3.0: an improved variant concept recognition and normalization tool	Apr 7, 2022	Benchmarking	—Unverified
Token Sequence Compression for Efficient Multimodal Computing	Apr 24, 2025	Benchmarking	—Unverified
Top-k Regularization for Supervised Feature Selection	Jun 4, 2021	Benchmarkingfeature selection	—Unverified
Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection	Aug 23, 2024	BenchmarkingBinary Classification	—Unverified
Totally Corrective Boosting with Cardinality Penalization	Apr 7, 2015	BenchmarkingCombinatorial Optimization	—Unverified
TOTOPO: Classifying univariate and multivariate time series with Topological Data Analysis	Oct 10, 2020	BenchmarkingTime Series	—Unverified
Toward an ImageNet Library of Functions for Global Optimization Benchmarking	Jun 27, 2022	Benchmarkingglobal-optimization	—Unverified
Toward end-to-end interpretable convolutional neural networks for waveform signals	May 3, 2024	BenchmarkingEmotion Recognition	—Unverified
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage	Dec 20, 2024	AttributeBenchmarking	—Unverified
Towards a Benchmark for Scientific Understanding in Humans and Machines	Apr 20, 2023	BenchmarkingInformation Retrieval	—Unverified
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving	May 29, 2020	Benchmarking	—Unverified
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems	Jul 26, 2024	Benchmarking	—Unverified
Towards an AI Accountability Policy	Jul 25, 2023	BenchmarkingFairness	—Unverified
Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations	Jul 17, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Towards a Taxonomy of Graph Learning Datasets	Oct 27, 2021	BenchmarkingGraph Learning	—Unverified
Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling (1+λ) EA Variants on OneMax and LeadingOnes	Aug 17, 2018	BenchmarkingEvolutionary Algorithms	—Unverified
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins	Apr 4, 2025	Benchmarking	—Unverified
Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios	Mar 31, 2025	Adversarial AttackAutonomous Driving	—Unverified
Towards Benchmarking and Evaluating Deepfake Detection	Mar 4, 2022	BenchmarkingDeepFake Detection	—Unverified
Towards Benchmarking Explainable Artificial Intelligence Methods	Aug 25, 2022	BenchmarkingExplainable artificial intelligence	—Unverified
Towards Benchmarking Scene Background Initialization	Jun 12, 2015	Benchmarking	—Unverified
Towards Benchmarking the Utility of Explanations for Model Debugging	May 10, 2021	Benchmarking	—Unverified
Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds	Feb 28, 2022	BenchmarkingObject Tracking	—Unverified
Towards Effective Disambiguation for Machine Translation with Large Language Models	Sep 20, 2023	BenchmarkingIn-Context Learning	—Unverified
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques	Jun 6, 2025	BenchmarkingModel Selection	—Unverified
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset	Feb 26, 2024	BenchmarkingCross-Lingual Transfer	—Unverified
Towards Explainable Network Intrusion Detection using Large Language Models	Aug 8, 2024	BenchmarkingIntrusion Detection	—Unverified
Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking	Feb 16, 2023	Benchmarkingcounterfactual	—Unverified
Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings	Dec 10, 2024	BenchmarkingGraph Learning	—Unverified
Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours	Dec 28, 2024	BenchmarkingGPU	—Unverified
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models	Mar 10, 2025	AllBenchmarking	—Unverified
Towards Large-Scale Small Object Detection: Survey and Benchmarks	Jul 28, 2022	BenchmarkingObject	—Unverified
Towards Long-Term predictions of Turbulence using Neural Operators	Jul 25, 2023	Benchmarking	—Unverified
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks	May 17, 2023	Benchmarking	—Unverified
Towards Personalized Federated Learning	Mar 1, 2021	BenchmarkingFederated Learning	—Unverified
Towards Private Learning on Decentralized Graphs with Local Differential Privacy	Jan 23, 2022	BenchmarkingGraph Learning	—Unverified
Towards Productionizing Subjective Search Systems	Mar 31, 2020	BenchmarkingLanguage Modelling	—Unverified

Show:10 25 50

← PrevPage 84 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified