Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3551–3600 of 5548 papers

Title	Date	Tasks	Status
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests	Oct 31, 2023	Benchmarking	—Unverified
A Metadata-Driven Approach to Understand Graph Neural Networks	Oct 30, 2023	BenchmarkingGraph Learning	—Unverified
Domain Generalization in Computational Pathology: Survey and Guidelines	Oct 30, 2023	BenchmarkingDiagnostic	—Unverified
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection	Oct 29, 2023	BenchmarkingDiversity	—Unverified
Evaluating LLP Methods: Challenges and Approaches	Oct 29, 2023	BenchmarkingModel Selection	CodeCode Available
Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness	Oct 28, 2023	Benchmarkingimage-classification	CodeCode Available
On General Language Understanding	Oct 27, 2023	BenchmarkingEthics	—Unverified
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression	Oct 27, 2023	BenchmarkingGPU	CodeCode Available
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User	Oct 26, 2023	Anomaly DetectionBenchmarking	—Unverified
RDBench: ML Benchmark for Relational Databases	Oct 25, 2023	Benchmarking	—Unverified
ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair	Oct 25, 2023	BenchmarkingFault localization	—Unverified
XFEVER: Exploring Fact Verification across Languages	Oct 25, 2023	BenchmarkingFact Verification	CodeCode Available
Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting	Oct 25, 2023	BenchmarkingHyperparameter Optimization	—Unverified
BLESS: Benchmarking Large Language Models on Sentence Simplification	Oct 24, 2023	BenchmarkingDiversity	CodeCode Available
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic	Oct 23, 2023	BenchmarkingInstruction Following	—Unverified
XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series Classification	Oct 23, 2023	BenchmarkingTime Series	CodeCode Available
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design	Oct 23, 2023	BenchmarkingImage Generation	CodeCode Available
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video	Oct 22, 2023	3D ReconstructionAnatomy	—Unverified
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation	Oct 21, 2023	BenchmarkingLanguage Model Evaluation	—Unverified
Benchmarking and Improving Text-to-SQL Generation under Ambiguity	Oct 20, 2023	BenchmarkingDiversity	CodeCode Available
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models	Oct 20, 2023	Activity PredictionBenchmarking	CodeCode Available
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package	Oct 20, 2023	Benchmarking	—Unverified
Almost Equivariance via Lie Algebra Convolutions	Oct 19, 2023	Benchmarking	—Unverified
Benchmarking GPUs on SVBRDF Extractor Model	Oct 19, 2023	BenchmarkingGPU	—Unverified
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available
Alexpaca: Learning Factual Clarification Question Generation Without Examples	Oct 17, 2023	BenchmarkingChatbot	—Unverified
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali	Oct 16, 2023	BenchmarkingData Augmentation	—Unverified
A Novel Benchmarking Paradigm and a Scale- and Motion-Aware Model for Egocentric Pedestrian Trajectory Prediction	Oct 16, 2023	BenchmarkingPedestrian Trajectory Prediction	—Unverified
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition	Oct 16, 2023	BenchmarkingMicro Expression Recognition	—Unverified
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available
Assessing Encoder-Decoder Architectures for Robust Coronary Artery Segmentation	Oct 16, 2023	BenchmarkingCoronary Artery Segmentation	—Unverified
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning	Oct 15, 2023	BenchmarkingSpatial Reasoning	—Unverified
Prompting Scientific Names for Zero-Shot Species Recognition	Oct 15, 2023	BenchmarkingZero-Shot Learning	—Unverified
Benchmarking the Sim-to-Real Gap in Cloth Manipulation	Oct 14, 2023	BenchmarkingMuJoCo	—Unverified
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems	Oct 14, 2023	Benchmarking	CodeCode Available
Mirage: Model-Agnostic Graph Distillation for Graph Classification	Oct 14, 2023	BenchmarkingClassification	CodeCode Available
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts	Oct 13, 2023	BenchmarkingSentiment Analysis	CodeCode Available
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches	Oct 12, 2023	BenchmarkingColorization	—Unverified
Who Said That? Benchmarking Social Media AI Detection	Oct 12, 2023	BenchmarkingMisinformation	—Unverified
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images	Oct 12, 2023	BenchmarkingDecoder	—Unverified
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms	Oct 11, 2023	BenchmarkingDenoising	—Unverified
Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey	Oct 11, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified
Risk Aware Benchmarking of Large Language Models	Oct 11, 2023	BenchmarkingEconometrics	—Unverified
Transformers for Green Semantic Communication: Less Energy, More Semantics	Oct 11, 2023	BenchmarkingCPU	CodeCode Available
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning	Oct 11, 2023	BenchmarkingDiversity	—Unverified
Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design	Oct 11, 2023	BenchmarkingRepresentation Learning	—Unverified
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision	Oct 10, 2023	Acute Stroke Lesion SegmentationBenchmarking	CodeCode Available
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets	Oct 10, 2023	AllBenchmarking	—Unverified
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods	Oct 10, 2023	BenchmarkingPrediction	—Unverified
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization	Oct 9, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 72 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified