Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2301–2325 of 5548 papers

Title	Date	Tasks	Status
Unraveling the Capabilities of Language Models in News Summarization	Jan 30, 2025	BenchmarkingFew-Shot Learning	CodeCode Available
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Jan 30, 2025	BenchmarkingLanguage Modeling	—Unverified
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding	Jan 30, 2025	BenchmarkingDecision Making	—Unverified
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research	Jan 29, 2025	Benchmarking	—Unverified
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection	Jan 28, 2025	Benchmarking	—Unverified
Making Sense of Data in the Wild: Data Analysis Automation at Scale	Jan 27, 2025	BenchmarkingDiversity	—Unverified
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share	Jan 27, 2025	BenchmarkingTransfer Learning	—Unverified
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding	Jan 27, 2025	BenchmarkingCommon Sense Reasoning	—Unverified
A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems	Jan 27, 2025	BenchmarkingEvolutionary Algorithms	—Unverified
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding	Jan 27, 2025	BenchmarkingDiversity	—Unverified
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation	Jan 27, 2025	BenchmarkingC++ code	—Unverified
Benchmarking Quantum Reinforcement Learning	Jan 27, 2025	Benchmarkingreinforcement-learning	CodeCode Available
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry	Jan 26, 2025	BenchmarkingObject Detection	—Unverified
Beyond Benchmarks: On The False Promise of AI Regulation	Jan 26, 2025	Benchmarking	—Unverified
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search	Jan 26, 2025	BenchmarkingDiversity	CodeCode Available
Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets?	Jan 26, 2025	BenchmarkingSelf-Supervised Learning	—Unverified
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study	Jan 25, 2025	Benchmarking	—Unverified
Benchmarking global optimization techniques for unmanned aerial vehicle path planning	Jan 24, 2025	Benchmarkingglobal-optimization	—Unverified
The Karp Dataset	Jan 24, 2025	BenchmarkingMathematical Reasoning	—Unverified
Feature-based Evolutionary Diversity Optimization of Discriminating Instances for Chance-constrained Optimization Problems	Jan 24, 2025	BenchmarkingDiversity	—Unverified
AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning	Jan 23, 2025	Benchmarkingimage-classification	—Unverified
DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale	Jan 23, 2025	Benchmarking	—Unverified
You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain	Jan 23, 2025	BenchmarkingDomain Adaptation	—Unverified
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization	Jan 22, 2025	Benchmarkingregression	—Unverified
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF	Jan 22, 2025	BenchmarkingHallucination	—Unverified

Show:10 25 50

← PrevPage 93 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified