Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3301–3325 of 5548 papers

Title	Date	Tasks	Status	Hype
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models	May 18, 2023	BenchmarkingImage Generation	CodeCode Available	1
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models	May 18, 2023	Benchmarking	—Unverified	0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI	May 17, 2023	Benchmarking	—Unverified	0
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering	May 17, 2023	BenchmarkingDiagnostic	CodeCode Available	1
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks	May 17, 2023	Benchmarking	—Unverified	0
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go	May 17, 2023	BenchmarkingImage Restoration	—Unverified	0
DLUE: Benchmarking Document Language Understanding	May 16, 2023	BenchmarkingDocument Classification	—Unverified	0
An Empirical Study on Google Research Football Multi-agent Scenarios	May 16, 2023	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1
Benchmarking the human brain against computational architectures	May 15, 2023	BenchmarkingComputational Efficiency	—Unverified	0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking	May 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Predictive Models from Quantum Computer Benchmarks	May 15, 2023	Benchmarkingimage-classification	—Unverified	0
A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems	May 13, 2023	Benchmarking	—Unverified	0
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine	May 12, 2023	Benchmarking	—Unverified	0
Benchmarking large language models for biomedical natural language processing applications and recommendations	May 10, 2023	BenchmarkingDocument Classification	CodeCode Available	1
A Platform for the Biomedical Application of Large Language Models	May 10, 2023	BenchmarkingPrivacy Preserving	CodeCode Available	1
Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection	May 10, 2023	BenchmarkingCommunity Detection	—Unverified	0
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation	May 10, 2023	BenchmarkingImage Captioning	CodeCode Available	1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects	May 9, 2023	BenchmarkingDecision Making	CodeCode Available	1
Comparing Foundation Models using Data Kernels	May 9, 2023	BenchmarkingSelf-Supervised Learning	—Unverified	0
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness	May 5, 2023	BenchmarkingDataset Distillation	—Unverified	0
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey	May 5, 2023	BenchmarkingImage Generation	CodeCode Available	0
Semantic Segmentation using Vision Transformers: A survey	May 5, 2023	Autonomous DrivingBenchmarking	—Unverified	0
Can LLMs Capture Human Preferences?	May 4, 2023	Benchmarking	—Unverified	0
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view	May 4, 2023	BenchmarkingGraph Generation	—Unverified	0
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1

Show:10 25 50

← PrevPage 133 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified