Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3576–3600 of 5548 papers

Title	Date	Tasks	Status
Alexpaca: Learning Factual Clarification Question Generation Without Examples	Oct 17, 2023	BenchmarkingChatbot	—Unverified
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali	Oct 16, 2023	BenchmarkingData Augmentation	—Unverified
A Novel Benchmarking Paradigm and a Scale- and Motion-Aware Model for Egocentric Pedestrian Trajectory Prediction	Oct 16, 2023	BenchmarkingPedestrian Trajectory Prediction	—Unverified
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition	Oct 16, 2023	BenchmarkingMicro Expression Recognition	—Unverified
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available
Assessing Encoder-Decoder Architectures for Robust Coronary Artery Segmentation	Oct 16, 2023	BenchmarkingCoronary Artery Segmentation	—Unverified
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning	Oct 15, 2023	BenchmarkingSpatial Reasoning	—Unverified
Prompting Scientific Names for Zero-Shot Species Recognition	Oct 15, 2023	BenchmarkingZero-Shot Learning	—Unverified
Benchmarking the Sim-to-Real Gap in Cloth Manipulation	Oct 14, 2023	BenchmarkingMuJoCo	—Unverified
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems	Oct 14, 2023	Benchmarking	CodeCode Available
Mirage: Model-Agnostic Graph Distillation for Graph Classification	Oct 14, 2023	BenchmarkingClassification	CodeCode Available
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts	Oct 13, 2023	BenchmarkingSentiment Analysis	CodeCode Available
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches	Oct 12, 2023	BenchmarkingColorization	—Unverified
Who Said That? Benchmarking Social Media AI Detection	Oct 12, 2023	BenchmarkingMisinformation	—Unverified
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images	Oct 12, 2023	BenchmarkingDecoder	—Unverified
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms	Oct 11, 2023	BenchmarkingDenoising	—Unverified
Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey	Oct 11, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified
Risk Aware Benchmarking of Large Language Models	Oct 11, 2023	BenchmarkingEconometrics	—Unverified
Transformers for Green Semantic Communication: Less Energy, More Semantics	Oct 11, 2023	BenchmarkingCPU	CodeCode Available
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning	Oct 11, 2023	BenchmarkingDiversity	—Unverified
Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design	Oct 11, 2023	BenchmarkingRepresentation Learning	—Unverified
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision	Oct 10, 2023	Acute Stroke Lesion SegmentationBenchmarking	CodeCode Available
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets	Oct 10, 2023	AllBenchmarking	—Unverified
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods	Oct 10, 2023	BenchmarkingPrediction	—Unverified
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization	Oct 9, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 144 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified