Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3551–3600 of 5548 papers

Title	Date	Tasks	Status
MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf	Feb 5, 2025	BenchmarkingScheduling	—Unverified
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset	Feb 26, 2024	BenchmarkingCross-Lingual Transfer	—Unverified
MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models	Dec 5, 2024	BenchmarkingDomain Generalization	—Unverified
Benchmarking Large Language Model Capabilities for Conditional Generation	Jun 29, 2023	BenchmarkingFew-Shot Learning	—Unverified
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts	Jun 1, 2022	BenchmarkingBinary Classification	—Unverified
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks	Nov 13, 2023	Benchmarking	—Unverified
MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP	Jun 4, 2025	BenchmarkingLanguage Modelling	—Unverified
Benchmarking Lane-changing Decision-making for Deep Reinforcement Learning	Sep 22, 2021	Autonomous DrivingBenchmarking	—Unverified
MeltpoolNet: Melt pool Characteristic Prediction in Metal Additive Manufacturing Using Machine Learning	Jan 26, 2022	ArticlesBenchmarking	—Unverified
Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation	Jan 4, 2021	BenchmarkingQuestion Answering	—Unverified
MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition	Jul 8, 2024	BenchmarkingDeep Learning	—Unverified
Towards Explainable Network Intrusion Detection using Large Language Models	Aug 8, 2024	BenchmarkingIntrusion Detection	—Unverified
Benchmarking KAZE and MCM for Multiclass Classification	May 20, 2015	BenchmarkingClassification	—Unverified
What cleaves? Is proteasomal cleavage prediction reaching a ceiling?	Oct 24, 2022	BenchmarkingDenoising	—Unverified
Benchmarking Joint Lexical and Syntactic Analysis on Multiword-Rich Data	Apr 1, 2017	BenchmarkingDependency Parsing	—Unverified
Benchmarking Joint Face Spoofing and Forgery Detection with Visual and Physiological Cues	Aug 10, 2022	BenchmarkingDeepFake Detection	—Unverified
Metaethical Perspectives on 'Benchmarking' AI Ethics	Apr 11, 2022	BenchmarkingEthics	—Unverified
Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking	Feb 16, 2023	Benchmarkingcounterfactual	—Unverified
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction	Aug 29, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A deep convolutional neural network model for rapid prediction of fluvial flood inundation	Jun 20, 2020	BenchmarkingComputational Efficiency	—Unverified
Meta learning to classify intent and slot labels with noisy few shot examples	Nov 30, 2020	Benchmarkingintent-classification	—Unverified
Benchmarking Invertible Architectures on Inverse Problems	Jan 26, 2021	Benchmarking	—Unverified
Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models	Nov 15, 2016	Benchmarking	—Unverified
Metastatic Cancer Outcome Prediction with Injective Multiple Instance Pooling	Mar 9, 2022	BenchmarkingManagement	—Unverified
Benchmarking in Optimization: Best Practice and Open Issues	Jul 7, 2020	Benchmarking	—Unverified
Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings	Dec 10, 2024	BenchmarkingGraph Learning	—Unverified
Methods and open-source toolkit for analyzing and visualizing challenge results	Oct 11, 2019	Benchmarking	—Unverified
Methods and Trends in Detecting Generated Images: A Comprehensive Review	Feb 21, 2025	BenchmarkingDeepFake Detection	—Unverified
Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and a Path to Best Practices for Machine Learning in Chemistry	Sep 30, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified
Bench-Marking Information Extraction in Semi-Structured Historical Handwritten Records	Jul 17, 2018	BenchmarkingHandwritten Text Recognition	—Unverified
Benchmarking Inference Performance of Deep Learning Models on Analog Devices	Nov 24, 2020	BenchmarkingDeep Learning	—Unverified
MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models	Feb 21, 2025	BenchmarkingDiagnostic	—Unverified
MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation	Mar 29, 2025	Answer GenerationBenchmarking	—Unverified
Benchmarking Individual Tree Mapping with Sub-meter Imagery	Nov 14, 2023	BenchmarkingSegmentation	—Unverified
Microtask crowdsourcing for disease mention annotation in PubMed abstracts	Aug 8, 2014	Benchmarking	—Unverified
Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)	Aug 6, 2023	BenchmarkingImage Segmentation	—Unverified
Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data	Mar 27, 2024	BenchmarkingCancer Classification	—Unverified
Benchmarking Image Sensors Under Adverse Weather Conditions for Autonomous Driving	Dec 6, 2019	Autonomous DrivingBenchmarking	—Unverified
MileBench: Benchmarking MLLMs in Long Context	Apr 29, 2024	BenchmarkingDiagnostic	—Unverified
Addressing the Real-world Class Imbalance Problem in Dermatology	Oct 9, 2020	BenchmarkingFew-Shot Learning	—Unverified
MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries	May 22, 2025	BenchmarkingInformation Retrieval	—Unverified
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs	Apr 10, 2025	BenchmarkingContrastive Learning	—Unverified
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Jun 26, 2025	Benchmarking	—Unverified
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification	Feb 6, 2024	BenchmarkingMultiple-choice	—Unverified
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Oct 12, 2021	Benchmarking	—Unverified
Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction	Dec 12, 2022	BenchmarkingMulti-step retrosynthesis	—Unverified
What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs	May 15, 2025	AllBenchmarking	—Unverified
Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning	Dec 18, 2024	BenchmarkingPosition	—Unverified
Benchmarking Human Face Similarity Using Identical Twins	Aug 25, 2022	Benchmarking	—Unverified
Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours	Dec 28, 2024	BenchmarkingGPU	—Unverified

Show:10 25 50

← PrevPage 72 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified