Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3101–3125 of 5548 papers

Title	Date	Tasks	Status
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking	Jun 6, 2024	6D Pose Estimation using RGBBenchmarking	—Unverified
Time Sensitive Knowledge Editing through Efficient Finetuning	Jun 6, 2024	Benchmarkingknowledge editing	—Unverified
Statistical Multicriteria Benchmarking via the GSD-Front	Jun 6, 2024	Benchmarking	—Unverified
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection	Jun 5, 2024	Anomaly DetectionBenchmarking	—Unverified
Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation	Jun 5, 2024	BenchmarkingImage Segmentation	—Unverified
Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs	Jun 4, 2024	BenchmarkingFairness	—Unverified
Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check	Jun 4, 2024	BenchmarkingRepresentation Learning	—Unverified
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN Performance	Jun 4, 2024	BenchmarkingDrug Discovery	CodeCode Available
Analyzing the Feature Extractor Networks for Face Image Synthesis	Jun 4, 2024	BenchmarkingImage Generation	CodeCode Available
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset	Jun 4, 2024	Benchmarking	CodeCode Available
ACCORD: Closing the Commonsense Measurability Gap	Jun 4, 2024	BenchmarkingCommon Sense Reasoning	CodeCode Available
TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability	Jun 4, 2024	BenchmarkingLanguage Modeling	CodeCode Available
LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions	Jun 3, 2024	Autonomous DrivingBenchmarking	—Unverified
ELSA: Evaluating Localization of Social Activities in Urban Streets using Open-Vocabulary Detection	Jun 3, 2024	Action RecognitionBenchmarking	—Unverified
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models	Jun 3, 2024	BenchmarkingCode Completion	—Unverified
Scaffold Splits Overestimate Virtual Screening Performance	Jun 2, 2024	BenchmarkingClustering	—Unverified
WebSuite: Systematically Evaluating Why Web Agents Fail	Jun 1, 2024	BenchmarkingDiagnostic	CodeCode Available
On the project risk baseline: integrating aleatory uncertainty into project scheduling	May 31, 2024	BenchmarkingScheduling	—Unverified
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images	May 30, 2024	AllBenchmarking	—Unverified
CoSy: Evaluating Textual Explanations of Neurons	May 30, 2024	Benchmarking	—Unverified
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification	May 29, 2024	Benchmarking	—Unverified
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data	May 29, 2024	BenchmarkingSpecificity	—Unverified
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion	May 28, 2024	BenchmarkingEmotion Recognition	—Unverified
Risk-Neutral Generative Networks	May 28, 2024	Benchmarking	—Unverified
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis	May 27, 2024	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 125 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified