Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2851–2875 of 5548 papers

Title	Date	Tasks	Status	Hype
Alexpaca: Learning Factual Clarification Question Generation Without Examples	Oct 17, 2023	BenchmarkingChatbot	—Unverified	0
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Oct 17, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations	Oct 17, 2023	BenchmarkingEmotion Recognition	CodeCode Available	1
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali	Oct 16, 2023	BenchmarkingData Augmentation	—Unverified	0
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition	Oct 16, 2023	BenchmarkingMicro Expression Recognition	—Unverified	0
Assessing Encoder-Decoder Architectures for Robust Coronary Artery Segmentation	Oct 16, 2023	BenchmarkingCoronary Artery Segmentation	—Unverified	0
3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding	Oct 16, 2023	Action RecognitionBenchmarking	CodeCode Available	1
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available	0
A Novel Benchmarking Paradigm and a Scale- and Motion-Aware Model for Egocentric Pedestrian Trajectory Prediction	Oct 16, 2023	BenchmarkingPedestrian Trajectory Prediction	—Unverified	0
Prompting Scientific Names for Zero-Shot Species Recognition	Oct 15, 2023	BenchmarkingZero-Shot Learning	—Unverified	0
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning	Oct 15, 2023	BenchmarkingSpatial Reasoning	—Unverified	0
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems	Oct 14, 2023	Benchmarking	CodeCode Available	0
Benchmarking the Sim-to-Real Gap in Cloth Manipulation	Oct 14, 2023	BenchmarkingMuJoCo	—Unverified	0
Mirage: Model-Agnostic Graph Distillation for Graph Classification	Oct 14, 2023	BenchmarkingClassification	CodeCode Available	0
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters	Oct 13, 2023	BenchmarkingFairness	CodeCode Available	1
pose-format: Library for Viewing, Augmenting, and Handling .pose Files	Oct 13, 2023	BenchmarkingManagement	CodeCode Available	1
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts	Oct 13, 2023	BenchmarkingSentiment Analysis	CodeCode Available	0
Welfare Diplomacy: Benchmarking Language Model Cooperation	Oct 13, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning	Oct 12, 2023	Benchmarking	CodeCode Available	1
GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts	Oct 12, 2023	Benchmarking	CodeCode Available	1
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches	Oct 12, 2023	BenchmarkingColorization	—Unverified	0
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images	Oct 12, 2023	BenchmarkingDecoder	—Unverified	0
Who Said That? Benchmarking Social Media AI Detection	Oct 12, 2023	BenchmarkingMisinformation	—Unverified	0
Towards Evaluating Generalist Agents: An Automated Benchmark in Open World	Oct 12, 2023	BenchmarkingDiversity	CodeCode Available	1
Octopus: Embodied Vision-Language Programmer from Environmental Feedback	Oct 12, 2023	BenchmarkingCode Generation	CodeCode Available	2

Show:10 25 50

← PrevPage 115 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified