Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2351–2375 of 5548 papers

Title	Date	Tasks	Status
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Jan 14, 2025	BenchmarkingQuestion Answering	—Unverified
Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features	Jan 14, 2025	Benchmarking	—Unverified
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification	Jan 14, 2025	BenchmarkingGraph Representation Learning	CodeCode Available
Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI	Jan 13, 2025	ARCBenchmarking	—Unverified
The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways	Jan 13, 2025	BenchmarkingMetaheuristic Optimization	—Unverified
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles	Jan 13, 2025	ArticlesBenchmarking	—Unverified
Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks	Jan 13, 2025	Benchmarking	CodeCode Available
Lessons From Red Teaming 100 Generative AI Products	Jan 13, 2025	BenchmarkingRed Teaming	—Unverified
Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure	Jan 12, 2025	BenchmarkingHyperparameter Optimization	—Unverified
Evidential Deep Learning for Uncertainty Quantification and Out-of-Distribution Detection in Jet Identification using Deep Neural Networks	Jan 10, 2025	Anomaly DetectionBenchmarking	CodeCode Available
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition	Jan 10, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models	Jan 9, 2025	BenchmarkingPhilosophical Reflection	—Unverified
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation	Jan 9, 2025	2k8k	—Unverified
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning	Jan 9, 2025	BenchmarkingQuestion Answering	—Unverified
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing	Jan 9, 2025	BenchmarkingChatbot	—Unverified
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI	Jan 9, 2025	Benchmarkingnamed-entity-recognition	—Unverified
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks	Jan 8, 2025	BenchmarkingDeep Learning	—Unverified
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization	Jan 8, 2025	BenchmarkingGeneral Knowledge	—Unverified
An Analysis of Model Robustness across Concurrent Distribution Shifts	Jan 8, 2025	Benchmarking	—Unverified
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding	Jan 7, 2025	BenchmarkingCode Generation	—Unverified
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices	Jan 7, 2025	BenchmarkingClustering	—Unverified
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input	Jan 6, 2025	BenchmarkingForm	—Unverified
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models	Jan 6, 2025	BenchmarkingFeature Compression	—Unverified
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks	Jan 5, 2025	Adversarial RobustnessBenchmarking	CodeCode Available

Show:10 25 50

← PrevPage 95 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified