Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3526–3550 of 5548 papers

Title	Date	Tasks	Status
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks	Nov 13, 2023	Benchmarking	—Unverified
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images	Nov 13, 2023	BenchmarkingClassification	CodeCode Available
Identification of vortex in unstructured mesh with graph neural networks	Nov 11, 2023	BenchmarkingGraph Generation	—Unverified
SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification	Nov 9, 2023	BenchmarkingInstance Segmentation	—Unverified
Prompt Sketching for Large Language Models	Nov 8, 2023	Arithmetic ReasoningBenchmarking	—Unverified
An efficiency analysis of Spanish airports	Nov 8, 2023	Benchmarking	—Unverified
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction	Nov 8, 2023	BenchmarkingClick-Through Rate Prediction	CodeCode Available
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding	Nov 7, 2023	3D ReconstructionBenchmarking	CodeCode Available
Benchmarking Deep Facial Expression Recognition: An Extensive Protocol with Balanced Dataset in the Wild	Nov 6, 2023	BenchmarkingFacial Expression Recognition	—Unverified
Benchmarking Differential Evolution on a Quantum Simulator	Nov 6, 2023	BenchmarkingEvolutionary Algorithms	—Unverified
Exploitation-Guided Exploration for Semantic Embodied Navigation	Nov 6, 2023	Benchmarking	—Unverified
Benchmarking a Benchmark: How Reliable is MS-COCO?	Nov 5, 2023	Benchmarkingimage-classification	—Unverified
Learning Disentangled Speech Representations	Nov 4, 2023	BenchmarkingDisentanglement	—Unverified
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval	Nov 3, 2023	BenchmarkingFairness	CodeCode Available
Grounded Intuition of GPT-Vision's Abilities with Scientific Images	Nov 3, 2023	Benchmarkingcounterfactual	CodeCode Available
An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction	Nov 3, 2023	BenchmarkingSentence	—Unverified
Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature	Nov 3, 2023	Benchmarking	—Unverified
Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics	Nov 3, 2023	Benchmarkingquantile regression	—Unverified
Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia	Nov 2, 2023	BenchmarkingMachine Translation	CodeCode Available
Decentralized Federated Learning on the Edge over Wireless Mesh Networks	Nov 2, 2023	BenchmarkingFederated Learning	—Unverified
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs	Nov 1, 2023	BenchmarkingQuestion Answering	—Unverified
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization	Nov 1, 2023	Benchmarkingreinforcement-learning	—Unverified
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain	Oct 31, 2023	BenchmarkingDiagnostic	—Unverified
Next-generation MRD assays: do we have the tools to evaluate them properly?	Oct 31, 2023	BenchmarkingSensitivity	—Unverified
UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges	Oct 31, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 142 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified