Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2251–2275 of 5548 papers

Title	Date	Tasks	Status	Hype
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Apr 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension	Apr 25, 2024	BenchmarkingMultiple-choice	CodeCode Available	3
Benchmarking Mobile Device Control Agents across Diverse Configurations	Apr 25, 2024	BenchmarkingImitation Learning	—Unverified	0
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees	Apr 24, 2024	BenchmarkingMolecular Property Prediction	CodeCode Available	0
SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data	Apr 24, 2024	BenchmarkingFairness	CodeCode Available	1
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction	Apr 24, 2024	AttributeAttribute Value Extraction	CodeCode Available	1
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning	Apr 24, 2024	Benchmarkingreinforcement-learning	—Unverified	0
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler	Apr 24, 2024	Benchmarking	—Unverified	0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Apr 23, 2024	BenchmarkingHyperspectral Image Classification	CodeCode Available	0
Open Datasets for Satellite Radio Resource Control	Apr 22, 2024	BenchmarkingDecision Making	—Unverified	0
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches	Apr 22, 2024	BenchmarkingDiversity	—Unverified	0
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging	Apr 22, 2024	Benchmarking	CodeCode Available	1
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking	Apr 22, 2024	BenchmarkingMisinformation	—Unverified	0
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models	Apr 22, 2024	BenchmarkingWorld Knowledge	CodeCode Available	1
EnzChemRED, a rich enzyme chemistry relation extraction dataset	Apr 22, 2024	Benchmarkingnamed-entity-recognition	—Unverified	0
TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos	Apr 22, 2024	BenchmarkingMulti-Object Tracking	—Unverified	0
TAVGBench: Benchmarking Text to Audible-Video Generation	Apr 22, 2024	BenchmarkingContrastive Learning	CodeCode Available	1
In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review	Apr 21, 2024	BenchmarkingDecision Making	—Unverified	0
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News	Apr 21, 2024	BenchmarkingEmotion Recognition	CodeCode Available	0
Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization	Apr 20, 2024	Benchmarking	—Unverified	0
DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection	Apr 19, 2024	BenchmarkingDeepFake Detection	CodeCode Available	3
Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection	Apr 19, 2024	BenchmarkingIntegrated sensing and communication	—Unverified	0
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases	Apr 19, 2024	BenchmarkingRetrieval	CodeCode Available	3
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning	Apr 19, 2024	Benchmarkingcounterfactual	—Unverified	0
REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking	Apr 19, 2024	Benchmarkingcoreference-resolution	CodeCode Available	1

Show:10 25 50

← PrevPage 91 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified