SOTAVerified

Benchmarking

Papers

Showing 33013325 of 5548 papers

TitleStatusHype
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language ModelsCode1
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI0
PMC-VQA: Visual Instruction Tuning for Medical Visual Question AnsweringCode1
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks0
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go0
DLUE: Benchmarking Document Language Understanding0
An Empirical Study on Google Research Football Multi-agent ScenariosCode1
Benchmarking the human brain against computational architectures0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking0
Predictive Models from Quantum Computer Benchmarks0
A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems0
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine0
Benchmarking large language models for biomedical natural language processing applications and recommendationsCode1
A Platform for the Biomedical Application of Large Language ModelsCode1
Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection0
InfoMetIC: An Informative Metric for Reference-free Image Caption EvaluationCode1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated ObjectsCode1
Comparing Foundation Models using Data Kernels0
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness0
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A SurveyCode0
Semantic Segmentation using Vision Transformers: A survey0
Can LLMs Capture Human Preferences?0
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view0
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Show:102550
← PrevPage 133 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified