Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1025 of 5548 papers

Title	Date	Tasks	Status	Hype
EdgeMark: An Automation and Benchmarking System for Embedded Artificial Intelligence Tools	Feb 3, 2025	Benchmarking	—Unverified	0
SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering	Feb 3, 2025	BenchmarkingCode Generation	—Unverified	0
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models	Feb 2, 2025	Benchmarking	CodeCode Available	1
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks	Feb 2, 2025	Benchmarking	CodeCode Available	0
True Online TD-Replan(lambda) Achieving Planning through Replaying	Jan 31, 2025	Benchmarking	—Unverified	0
Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms	Jan 30, 2025	BenchmarkingCombinatorial Optimization	—Unverified	0
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Jan 30, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Unraveling the Capabilities of Language Models in News Summarization	Jan 30, 2025	BenchmarkingFew-Shot Learning	CodeCode Available	0
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding	Jan 30, 2025	BenchmarkingDecision Making	—Unverified	0
The iToBoS dataset: skin region images extracted from 3D total body photographs for lesion detection	Jan 30, 2025	BenchmarkingDiagnostic	CodeCode Available	0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research	Jan 29, 2025	Benchmarking	—Unverified	0
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model	Jan 28, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns	Jan 28, 2025	Adversarial AttackBenchmarking	CodeCode Available	1
Molecular-driven Foundation Model for Oncologic Pathology	Jan 28, 2025	BenchmarkingDiagnostic	CodeCode Available	4
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection	Jan 28, 2025	Benchmarking	—Unverified	0
Making Sense of Data in the Wild: Data Analysis Automation at Scale	Jan 27, 2025	BenchmarkingDiversity	—Unverified	0
A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems	Jan 27, 2025	BenchmarkingEvolutionary Algorithms	—Unverified	0
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding	Jan 27, 2025	BenchmarkingCommon Sense Reasoning	—Unverified	0
Benchmarking Quantum Reinforcement Learning	Jan 27, 2025	Benchmarkingreinforcement-learning	CodeCode Available	0
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation	Jan 27, 2025	BenchmarkingC++ code	—Unverified	0
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding	Jan 27, 2025	BenchmarkingDiversity	—Unverified	0
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share	Jan 27, 2025	BenchmarkingTransfer Learning	—Unverified	0
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search	Jan 26, 2025	BenchmarkingDiversity	CodeCode Available	0
Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets?	Jan 26, 2025	BenchmarkingSelf-Supervised Learning	—Unverified	0
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry	Jan 26, 2025	BenchmarkingObject Detection	—Unverified	0

Show:10 25 50

← PrevPage 41 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified