Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2851–2900 of 5548 papers

Title	Date	Tasks	Status	Hype
Alexpaca: Learning Factual Clarification Question Generation Without Examples	Oct 17, 2023	BenchmarkingChatbot	—Unverified	0
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Oct 17, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations	Oct 17, 2023	BenchmarkingEmotion Recognition	CodeCode Available	1
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali	Oct 16, 2023	BenchmarkingData Augmentation	—Unverified	0
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition	Oct 16, 2023	BenchmarkingMicro Expression Recognition	—Unverified	0
Assessing Encoder-Decoder Architectures for Robust Coronary Artery Segmentation	Oct 16, 2023	BenchmarkingCoronary Artery Segmentation	—Unverified	0
3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding	Oct 16, 2023	Action RecognitionBenchmarking	CodeCode Available	1
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available	0
A Novel Benchmarking Paradigm and a Scale- and Motion-Aware Model for Egocentric Pedestrian Trajectory Prediction	Oct 16, 2023	BenchmarkingPedestrian Trajectory Prediction	—Unverified	0
Prompting Scientific Names for Zero-Shot Species Recognition	Oct 15, 2023	BenchmarkingZero-Shot Learning	—Unverified	0
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning	Oct 15, 2023	BenchmarkingSpatial Reasoning	—Unverified	0
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems	Oct 14, 2023	Benchmarking	CodeCode Available	0
Benchmarking the Sim-to-Real Gap in Cloth Manipulation	Oct 14, 2023	BenchmarkingMuJoCo	—Unverified	0
Mirage: Model-Agnostic Graph Distillation for Graph Classification	Oct 14, 2023	BenchmarkingClassification	CodeCode Available	0
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters	Oct 13, 2023	BenchmarkingFairness	CodeCode Available	1
pose-format: Library for Viewing, Augmenting, and Handling .pose Files	Oct 13, 2023	BenchmarkingManagement	CodeCode Available	1
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts	Oct 13, 2023	BenchmarkingSentiment Analysis	CodeCode Available	0
Welfare Diplomacy: Benchmarking Language Model Cooperation	Oct 13, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning	Oct 12, 2023	Benchmarking	CodeCode Available	1
GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts	Oct 12, 2023	Benchmarking	CodeCode Available	1
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches	Oct 12, 2023	BenchmarkingColorization	—Unverified	0
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images	Oct 12, 2023	BenchmarkingDecoder	—Unverified	0
Who Said That? Benchmarking Social Media AI Detection	Oct 12, 2023	BenchmarkingMisinformation	—Unverified	0
Towards Evaluating Generalist Agents: An Automated Benchmark in Open World	Oct 12, 2023	BenchmarkingDiversity	CodeCode Available	1
Octopus: Embodied Vision-Language Programmer from Environmental Feedback	Oct 12, 2023	BenchmarkingCode Generation	CodeCode Available	2
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving	Oct 11, 2023	Autonomous DrivingBenchmarking	CodeCode Available	3
Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey	Oct 11, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified	0
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning	Oct 11, 2023	BenchmarkingDiversity	—Unverified	0
Transformers for Green Semantic Communication: Less Energy, More Semantics	Oct 11, 2023	BenchmarkingCPU	CodeCode Available	0
Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design	Oct 11, 2023	BenchmarkingRepresentation Learning	—Unverified	0
Risk Aware Benchmarking of Large Language Models	Oct 11, 2023	BenchmarkingEconometrics	—Unverified	0
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms	Oct 11, 2023	BenchmarkingDenoising	—Unverified	0
ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons	Oct 11, 2023	BenchmarkingPosition	CodeCode Available	2
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision	Oct 10, 2023	Acute Stroke Lesion SegmentationBenchmarking	CodeCode Available	0
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods	Oct 10, 2023	BenchmarkingPrediction	—Unverified	0
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models	Oct 10, 2023	BenchmarkingCode Generation	CodeCode Available	1
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach	Oct 10, 2023	BenchmarkingCode Generation	CodeCode Available	1
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets	Oct 10, 2023	AllBenchmarking	—Unverified	0
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization	Oct 9, 2023	Benchmarking	—Unverified	0
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis	Oct 9, 2023	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	3
Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data	Oct 9, 2023	BenchmarkingLanguage Modeling	CodeCode Available	0
Simple GNNs with Low Rank Non-parametric Aggregators	Oct 8, 2023	BenchmarkingNode Classification	CodeCode Available	0
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus	Oct 8, 2023	BenchmarkingMachine Translation	CodeCode Available	0
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems	Oct 8, 2023	Benchmarking	CodeCode Available	0
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction	Oct 8, 2023	BenchmarkingDecoder	—Unverified	0
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets	Oct 7, 2023	Benchmarkingnamed-entity-recognition	—Unverified	0
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data	Oct 7, 2023	Benchmarking	—Unverified	0
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards	Oct 6, 2023	Benchmarking	CodeCode Available	0
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods	Oct 6, 2023	BenchmarkingExperimental Design	—Unverified	0
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation	Oct 6, 2023	BenchmarkingMathematical Reasoning	—Unverified	0

Show:10 25 50

← PrevPage 58 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified