Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2551–2575 of 5548 papers

Title	Date	Tasks	Status	Hype
Coherent Feed Forward Quantum Neural Network	Feb 1, 2024	BenchmarkingDiagnostic	—Unverified	0
We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline	Feb 1, 2024	BenchmarkingDomain Adaptation	CodeCode Available	1
Benchmarking Transferable Adversarial Attacks	Feb 1, 2024	Adversarial AttackBenchmarking	CodeCode Available	1
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition	Jan 31, 2024	Action RecognitionBenchmarking	—Unverified	0
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench	Jan 31, 2024	BenchmarkingMultiple-choice	CodeCode Available	4
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	0
Explainable Benchmarking for Iterative Optimization Heuristics	Jan 31, 2024	BenchmarkingEvolutionary Algorithms	CodeCode Available	1
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios	Jan 30, 2024	Benchmarking	CodeCode Available	2
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels	Jan 30, 2024	Benchmarkingimage-classification	CodeCode Available	1
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets	Jan 29, 2024	BenchmarkingMachine Translation	CodeCode Available	1
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks	Jan 29, 2024	BenchmarkingCross-Lingual Transfer	CodeCode Available	0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA	Jan 29, 2024	BenchmarkingImage Comprehension	—Unverified	0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models	Jan 28, 2024	BenchmarkingCode Generation	CodeCode Available	0
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset	Jan 27, 2024	BenchmarkingTime Series	—Unverified	0
SAM-based instance segmentation models for the automation of structural damage detection	Jan 27, 2024	BenchmarkingInstance Segmentation	—Unverified	0
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries	Jan 27, 2024	BenchmarkingRAG	CodeCode Available	3
Biological Valuation Map of Flanders: A Sentinel-2 Imagery Analysis	Jan 26, 2024	BenchmarkingSemantic Segmentation	—Unverified	0
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs	Jan 26, 2024	BenchmarkingKnowledge Graphs	—Unverified	0
Automated legal reasoning with discretion to act using s(LAW)	Jan 25, 2024	BenchmarkingLegal Reasoning	—Unverified	0
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images	Jan 25, 2024	BenchmarkingSegmentation	—Unverified	0
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception	Jan 24, 2024	Benchmarking	CodeCode Available	1
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval	Jan 24, 2024	BenchmarkingImage Captioning	CodeCode Available	1
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding	Jan 24, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Benchmarking the Fairness of Image Upsampling Methods	Jan 24, 2024	BenchmarkingDiversity	CodeCode Available	0
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents	Jan 24, 2024	Benchmarking	CodeCode Available	3

Show:10 25 50

← PrevPage 103 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified