Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3301–3350 of 5548 papers

Title	Date	Tasks	Status	Hype
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models	May 18, 2023	BenchmarkingImage Generation	CodeCode Available	1
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models	May 18, 2023	Benchmarking	—Unverified	0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI	May 17, 2023	Benchmarking	—Unverified	0
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering	May 17, 2023	BenchmarkingDiagnostic	CodeCode Available	1
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks	May 17, 2023	Benchmarking	—Unverified	0
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go	May 17, 2023	BenchmarkingImage Restoration	—Unverified	0
DLUE: Benchmarking Document Language Understanding	May 16, 2023	BenchmarkingDocument Classification	—Unverified	0
An Empirical Study on Google Research Football Multi-agent Scenarios	May 16, 2023	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1
Benchmarking the human brain against computational architectures	May 15, 2023	BenchmarkingComputational Efficiency	—Unverified	0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking	May 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Predictive Models from Quantum Computer Benchmarks	May 15, 2023	Benchmarkingimage-classification	—Unverified	0
A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems	May 13, 2023	Benchmarking	—Unverified	0
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine	May 12, 2023	Benchmarking	—Unverified	0
Benchmarking large language models for biomedical natural language processing applications and recommendations	May 10, 2023	BenchmarkingDocument Classification	CodeCode Available	1
A Platform for the Biomedical Application of Large Language Models	May 10, 2023	BenchmarkingPrivacy Preserving	CodeCode Available	1
Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection	May 10, 2023	BenchmarkingCommunity Detection	—Unverified	0
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation	May 10, 2023	BenchmarkingImage Captioning	CodeCode Available	1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects	May 9, 2023	BenchmarkingDecision Making	CodeCode Available	1
Comparing Foundation Models using Data Kernels	May 9, 2023	BenchmarkingSelf-Supervised Learning	—Unverified	0
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness	May 5, 2023	BenchmarkingDataset Distillation	—Unverified	0
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey	May 5, 2023	BenchmarkingImage Generation	CodeCode Available	0
Semantic Segmentation using Vision Transformers: A survey	May 5, 2023	Autonomous DrivingBenchmarking	—Unverified	0
Can LLMs Capture Human Preferences?	May 4, 2023	Benchmarking	—Unverified	0
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view	May 4, 2023	BenchmarkingGraph Generation	—Unverified	0
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images	Apr 30, 2023	Benchmarkingobject-detection	—Unverified	0
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications	Apr 28, 2023	AutoMLBenchmarking	—Unverified	0
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
TorchBench: Benchmarking PyTorch with High API Surface Coverage	Apr 27, 2023	BenchmarkingGPU	CodeCode Available	3
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task	Apr 27, 2023	ArticlesBenchmarking	—Unverified	0
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency	Apr 26, 2023	BenchmarkingCloud Computing	—Unverified	0
On Pitfalls of RemOve-And-Retrain: Data Processing Inequality Perspective	Apr 26, 2023	BenchmarkingFeature Importance	CodeCode Available	0
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds	Apr 25, 2023	BenchmarkingPose Estimation	CodeCode Available	1
CIMLA: Interpretable AI for inference of differential causal networks	Apr 25, 2023	Benchmarking	—Unverified	0
Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints	Apr 25, 2023	BenchmarkingContrastive Learning	—Unverified	0
MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table	Apr 25, 2023	BenchmarkingGPU	CodeCode Available	1
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology	Apr 24, 2023	BenchmarkingDecision Making	CodeCode Available	0
A Framework for Benchmarking Real-Time Embedded Object Detection	Apr 23, 2023	BenchmarkingObject	—Unverified	0
Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification	Apr 23, 2023	BenchmarkingData Augmentation	—Unverified	0
RGB-D Indiscernible Object Counting in Underwater Scenes	Apr 23, 2023	BenchmarkingDepth Estimation	CodeCode Available	1
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1
SCoDA: Domain Adaptive Shape Completion for Real Scans	Apr 20, 2023	BenchmarkingDomain Adaptation	CodeCode Available	1
Learning a quantum computer's capability	Apr 20, 2023	Benchmarking	—Unverified	0
Towards a Benchmark for Scientific Understanding in Humans and Machines	Apr 20, 2023	BenchmarkingInformation Retrieval	—Unverified	0
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms	Apr 19, 2023	BenchmarkingDescriptive	CodeCode Available	0
Graph Neural Network-Based Anomaly Detection for River Network Systems	Apr 19, 2023	Anomaly DetectionBenchmarking	CodeCode Available	1
The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages	Apr 19, 2023	BenchmarkingMachine Translation	CodeCode Available	0
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints	Apr 18, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
A Comparison of Image Denoising Methods	Apr 18, 2023	BenchmarkingDenoising	CodeCode Available	1
Computational and Exploratory Landscape Analysis of the GKLS Generator	Apr 18, 2023	Benchmarkingglobal-optimization	—Unverified	0

Show:10 25 50

← PrevPage 67 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified