Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3351–3400 of 5548 papers

Title	Date	Tasks	Status
BdSLW60: A Word-Level Bangla Sign Language Dataset	Feb 13, 2024	BenchmarkingGesture Recognition	CodeCode Available
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems	Feb 12, 2024	Benchmarking	—Unverified
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Feb 12, 2024	BenchmarkingChunking	—Unverified
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study	Feb 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices	Feb 10, 2024	Benchmarking	—Unverified
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation	Feb 10, 2024	BenchmarkingLanguage Modeling	—Unverified
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation	Feb 9, 2024	6D Pose Estimation using RGBBenchmarking	—Unverified
A Functional Analysis Approach to Symbolic Regression	Feb 9, 2024	Benchmarkingregression	—Unverified
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education	Feb 9, 2024	BenchmarkingChatbot	—Unverified
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction	Feb 8, 2024	BenchmarkingFace Image Quality	—Unverified
Transparent and Scrutable Recommendations Using Natural Language User Profiles	Feb 8, 2024	BenchmarkingDescriptive	CodeCode Available
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset	Feb 8, 2024	Benchmarking	CodeCode Available
Towards Biologically Plausible and Private Gene Expression Data Generation	Feb 7, 2024	Benchmarking	CodeCode Available
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception	Feb 7, 2024	Benchmarking	CodeCode Available
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection	Feb 6, 2024	Benchmarking	CodeCode Available
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification	Feb 6, 2024	BenchmarkingMultiple-choice	—Unverified
Quantitative Metrics for Benchmarking Medical Image Harmonization	Feb 6, 2024	AnatomyBenchmarking	—Unverified
PowerGraph: A power grid benchmark dataset for graph neural networks	Feb 5, 2024	ArticlesBenchmarking	—Unverified
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation	Feb 5, 2024	BenchmarkingImage Segmentation	CodeCode Available
Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations	Feb 3, 2024	Benchmarking	CodeCode Available
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection	Feb 3, 2024	BenchmarkingHate Speech Detection	CodeCode Available
Can LLMs perform structured graph reasoning?	Feb 2, 2024	BenchmarkingNavigate	CodeCode Available
Variational Quantum Circuits Enhanced Generative Adversarial Network	Feb 2, 2024	BenchmarkingGenerative Adversarial Network	—Unverified
Benchmarking Spiking Neural Network Learning Methods with Varying Locality	Feb 1, 2024	Benchmarking	—Unverified
Coherent Feed Forward Quantum Neural Network	Feb 1, 2024	BenchmarkingDiagnostic	—Unverified
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures	Feb 1, 2024	AnatomyBenchmarking	—Unverified
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition	Jan 31, 2024	Action RecognitionBenchmarking	—Unverified
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks	Jan 29, 2024	BenchmarkingCross-Lingual Transfer	CodeCode Available
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA	Jan 29, 2024	BenchmarkingImage Comprehension	—Unverified
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models	Jan 28, 2024	BenchmarkingCode Generation	CodeCode Available
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset	Jan 27, 2024	BenchmarkingTime Series	—Unverified
SAM-based instance segmentation models for the automation of structural damage detection	Jan 27, 2024	BenchmarkingInstance Segmentation	—Unverified
Biological Valuation Map of Flanders: A Sentinel-2 Imagery Analysis	Jan 26, 2024	BenchmarkingSemantic Segmentation	—Unverified
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs	Jan 26, 2024	BenchmarkingKnowledge Graphs	—Unverified
Automated legal reasoning with discretion to act using s(LAW)	Jan 25, 2024	BenchmarkingLegal Reasoning	—Unverified
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images	Jan 25, 2024	BenchmarkingSegmentation	—Unverified
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding	Jan 24, 2024	BenchmarkingLanguage Modeling	—Unverified
Benchmarking the Fairness of Image Upsampling Methods	Jan 24, 2024	BenchmarkingDiversity	CodeCode Available
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method	Jan 23, 2024	BenchmarkingFairness	CodeCode Available
Deep Neural Network Benchmarks for Selective Classification	Jan 23, 2024	BenchmarkingClassification	CodeCode Available
What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition	Jan 23, 2024	Benchmarking	CodeCode Available
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trials	Jan 22, 2024	BenchmarkingSynthetic Data Generation	CodeCode Available
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound	Jan 20, 2024	Benchmarking	—Unverified
Data Augmentation for Traffic Classification	Jan 19, 2024	BenchmarkingClassification	—Unverified
Harnessing Orthogonality to Train Low-Rank Neural Networks	Jan 16, 2024	Benchmarking	CodeCode Available
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription	Jan 16, 2024	Automatic Speech RecognitionBenchmarking	—Unverified
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion	Jan 16, 2024	Benchmarking	—Unverified
Large Language Models are Null-Shot Learners	Jan 16, 2024	Arithmetic ReasoningBenchmarking	—Unverified

Show:10 25 50

← PrevPage 68 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified