Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3651–3700 of 5548 papers

Title	Date	Tasks	Status
Alexpaca: Learning Factual Clarification Question Generation Without Examples	Oct 17, 2023	BenchmarkingChatbot	—Unverified
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech	Jun 9, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Benchmarking Foundation Models with Language-Model-as-an-Examiner	Jun 7, 2023	BenchmarkingLanguage Modeling	—Unverified
Benchmarking Foundation Models for Zero-Shot Biometric Tasks	May 30, 2025	AttributeBenchmarking	—Unverified
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents	Jun 12, 2024	BenchmarkingLanguage Modeling	—Unverified
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases	Jun 12, 2024	BenchmarkingModel Compression	—Unverified
Benchmarking foundation models as feature extractors for weakly-supervised computational pathology	Aug 28, 2024	BenchmarkingDiversity	—Unverified
Model Agnostic Explainable Selective Regression via Uncertainty Estimation	Nov 15, 2023	Benchmarkingmodel	—Unverified
Model-based trajectory stitching for improved behavioural cloning and its applications	Dec 8, 2022	Behavioural cloningBenchmarking	—Unverified
Model-Based Underwater 6D Pose Estimation from RGB	Feb 14, 2023	2D Object Detection6D Pose Estimation	—Unverified
Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model	Apr 9, 2022	BenchmarkingLanguage Modeling	—Unverified
ModelHub.AI: Dissemination Platform for Deep Learning Models	Nov 26, 2019	BenchmarkingDeep Learning	—Unverified
Model Lakes	Mar 4, 2024	BenchmarkingManagement	—Unverified
Modelling Neuronal Behaviour with Time Series Regression: Recurrent Neural Networks on C. Elegans Data	Jul 1, 2021	Benchmarkingregression	—Unverified
Modelling neuronal behaviour with time series regression: Recurrent Neural Networks on synthetic C. elegans data	Sep 29, 2021	Benchmarkingregression	—Unverified
Modelling Regional Solar Photovoltaic Capacity in Great Britain	Feb 26, 2025	Benchmarking	—Unverified
Model-predictive control and reinforcement learning in multi-energy system case studies	Apr 20, 2021	BenchmarkingModel Predictive Control	—Unverified
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities	Feb 3, 2025	BenchmarkingLarge Language Model	—Unverified
Modern CNNs for IoT Based Farms	Jul 15, 2019	BenchmarkingCloud Computing	—Unverified
Modern, Efficient, and Differentiable Transport Equation Models using JAX: Applications to Population Balance Equations	Nov 1, 2024	BenchmarkingComputational Efficiency	—Unverified
Modified CMA-ES Algorithm for Multi-Modal Optimization: Incorporating Niching Strategies and Dynamic Adaptation Mechanism	Jul 1, 2024	BenchmarkingDiversity	—Unverified
ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models	Jun 1, 2025	BenchmarkingRelational Reasoning	—Unverified
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Dec 10, 2024	BenchmarkingMixture-of-Experts	—Unverified
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes	May 27, 2025	BenchmarkingDenoising	—Unverified
MO-IOHinspector: Anytime Benchmarking of Multi-Objective Algorithms using IOHprofiler	Dec 10, 2024	BenchmarkingExperimental Design	—Unverified
Benchmarking for Metaheuristic Black-Box Optimization: Perspectives and Open Challenges	Jul 1, 2020	BenchmarkingMetaheuristic Optimization	—Unverified
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents	May 16, 2025	BenchmarkingInstruction Following	—Unverified
Towards Personalized Federated Learning	Mar 1, 2021	BenchmarkingFederated Learning	—Unverified
MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design	Nov 10, 2024	3D geometryBenchmarking	—Unverified
Towards Private Learning on Decentralized Graphs with Local Differential Privacy	Jan 23, 2022	BenchmarkingGraph Learning	—Unverified
MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from Monocular RGB Videos	Dec 9, 2020	BenchmarkingObject	—Unverified
Benchmarking for Bayesian Reinforcement Learning	Sep 14, 2015	Benchmarkingreinforcement-learning	—Unverified
Towards Productionizing Subjective Search Systems	Mar 31, 2020	BenchmarkingLanguage Modelling	—Unverified
Momentum Contrastive Pre-training for Question Answering	Dec 12, 2022	BenchmarkingContrastive Learning	—Unverified
Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling	Oct 23, 2024	Benchmarking	—Unverified
Benchmarking fixed-length Fingerprint Representations across different Embedding Sizes and Sensor Types	Jul 17, 2023	Benchmarking	—Unverified
MorisienMT: A Dataset for Mauritian Creole Machine Translation	Jun 6, 2022	BenchmarkingMachine Translation	—Unverified
Morphing Attack Detection -- Database, Evaluation Platform and Benchmarking	Jun 11, 2020	BenchmarkingFace Recognition	—Unverified
MORSE: Semantic-ally Drive-n MORpheme SEgment-er	Feb 7, 2017	Benchmarking	—Unverified
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models	Jan 6, 2025	BenchmarkingFeature Compression	—Unverified
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level	Nov 15, 2024	Benchmarkingcounterfactual	—Unverified
A Dataset for Benchmarking Image-Based Localization	Jul 1, 2017	BenchmarkingImage-Based Localization	—Unverified
Movie Description	May 12, 2016	Benchmarking	—Unverified
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning	Jun 4, 2023	BenchmarkingContrastive Learning	—Unverified
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking	Dec 2, 2022	BenchmarkingInformation Retrieval	—Unverified
MozzaVID: Mozzarella Volumetric Image Dataset	Dec 6, 2024	BenchmarkingComputed Tomography (CT)	—Unverified
MPCLeague: Robust MPC Platform for Privacy-Preserving Machine Learning	Dec 26, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures	Feb 1, 2024	AnatomyBenchmarking	—Unverified
MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization	Nov 16, 2021	Benchmarkingdialogue summary	—Unverified
Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room	Apr 22, 2025	BenchmarkingFairness	—Unverified

Show:10 25 50

← PrevPage 74 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified