Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–925 of 5548 papers

Title	Date	Tasks	Status	Hype
Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse	Feb 20, 2025	BenchmarkingGraph Attention	—Unverified	0
Synthetic Porous Microstructures: Automatic Design, Simulation, and Permeability Analysis	Feb 20, 2025	Benchmarking	CodeCode Available	0
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems	Feb 20, 2025	BenchmarkingDecision Making	—Unverified	0
Position: There are no Champions in Long-Term Time Series Forecasting	Feb 19, 2025	BenchmarkingPosition	—Unverified	0
Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification	Feb 19, 2025	Benchmarking	—Unverified	0
Benchmarking Self-Supervised Learning Methods for Accelerated MRI Reconstruction	Feb 19, 2025	BenchmarkingMRI Reconstruction	CodeCode Available	0
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking	Feb 19, 2025	Benchmarking	—Unverified	0
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior	Feb 19, 2025	BenchmarkingMisinformation	—Unverified	0
Benchmarking LLMs for Political Science: A United Nations Perspective	Feb 19, 2025	BenchmarkingDecision Making	CodeCode Available	1
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare	Feb 19, 2025	BenchmarkingDiversity	—Unverified	0
Multilingual European Language Models: Benchmarking Approaches and Challenges	Feb 18, 2025	BenchmarkingQuestion Answering	—Unverified	0
A deep learning framework for efficient pathology image analysis	Feb 18, 2025	BenchmarkingDeep Learning	CodeCode Available	4
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics	Feb 18, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Feb 18, 2025	BenchmarkingText Generation	—Unverified	0
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models	Feb 18, 2025	BenchmarkingLarge Language Model	—Unverified	0
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Feb 18, 2025	Benchmarking	—Unverified	0
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?	Feb 18, 2025	BenchmarkingBlocking	CodeCode Available	1
A new pathway to generative artificial intelligence by minimizing the maximum entropy	Feb 18, 2025	Benchmarking	—Unverified	0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking	Feb 18, 2025	BenchmarkingBinary Classification	—Unverified	0
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis	Feb 18, 2025	BenchmarkingMamba	CodeCode Available	0
Benchmarking MedMNIST dataset on real quantum hardware	Feb 18, 2025	Benchmarkingimage-classification	—Unverified	0
Integrating Expert Knowledge into Logical Programs via LLMs	Feb 17, 2025	BenchmarkingLogical Reasoning	CodeCode Available	0
Positional Encoding in Transformer-Based Time Series Models: A Survey	Feb 17, 2025	Anomaly DetectionBenchmarking	CodeCode Available	1
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption	Feb 17, 2025	BenchmarkingCode Summarization	—Unverified	0
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims	Feb 17, 2025	BenchmarkingFact Checking	CodeCode Available	1

Show:10 25 50

← PrevPage 37 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified