Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2876–2900 of 5548 papers

Title	Date	Tasks	Status	Hype
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving	Oct 11, 2023	Autonomous DrivingBenchmarking	CodeCode Available	3
Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey	Oct 11, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified	0
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning	Oct 11, 2023	BenchmarkingDiversity	—Unverified	0
Transformers for Green Semantic Communication: Less Energy, More Semantics	Oct 11, 2023	BenchmarkingCPU	CodeCode Available	0
Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design	Oct 11, 2023	BenchmarkingRepresentation Learning	—Unverified	0
Risk Aware Benchmarking of Large Language Models	Oct 11, 2023	BenchmarkingEconometrics	—Unverified	0
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms	Oct 11, 2023	BenchmarkingDenoising	—Unverified	0
ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons	Oct 11, 2023	BenchmarkingPosition	CodeCode Available	2
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision	Oct 10, 2023	Acute Stroke Lesion SegmentationBenchmarking	CodeCode Available	0
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods	Oct 10, 2023	BenchmarkingPrediction	—Unverified	0
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models	Oct 10, 2023	BenchmarkingCode Generation	CodeCode Available	1
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach	Oct 10, 2023	BenchmarkingCode Generation	CodeCode Available	1
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets	Oct 10, 2023	AllBenchmarking	—Unverified	0
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization	Oct 9, 2023	Benchmarking	—Unverified	0
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis	Oct 9, 2023	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	3
Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data	Oct 9, 2023	BenchmarkingLanguage Modeling	CodeCode Available	0
Simple GNNs with Low Rank Non-parametric Aggregators	Oct 8, 2023	BenchmarkingNode Classification	CodeCode Available	0
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus	Oct 8, 2023	BenchmarkingMachine Translation	CodeCode Available	0
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems	Oct 8, 2023	Benchmarking	CodeCode Available	0
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction	Oct 8, 2023	BenchmarkingDecoder	—Unverified	0
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets	Oct 7, 2023	Benchmarkingnamed-entity-recognition	—Unverified	0
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data	Oct 7, 2023	Benchmarking	—Unverified	0
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards	Oct 6, 2023	Benchmarking	CodeCode Available	0
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods	Oct 6, 2023	BenchmarkingExperimental Design	—Unverified	0
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation	Oct 6, 2023	BenchmarkingMathematical Reasoning	—Unverified	0

Show:10 25 50

← PrevPage 116 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified