Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2351–2400 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions	Mar 29, 2024	Action DetectionBenchmarking	CodeCode Available	1
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context	Mar 29, 2024	BenchmarkingSentence	CodeCode Available	0
TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods	Mar 29, 2024	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	5
Are Large Language Models Good at Utility Judgments?	Mar 28, 2024	Answer GenerationBenchmarking	CodeCode Available	0
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM	Mar 28, 2024	Benchmarking	CodeCode Available	1
RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers	Mar 27, 2024	BenchmarkingDocument Ranking	CodeCode Available	1
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object	Mar 27, 2024	Benchmarking	CodeCode Available	1
Benchmarking Object Detectors with COCO: A New Path Forward	Mar 27, 2024	BenchmarkingObject	CodeCode Available	1
Towards Image Ambient Lighting Normalization	Mar 27, 2024	BenchmarkingImage Restoration	CodeCode Available	1
Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data	Mar 27, 2024	BenchmarkingCancer Classification	—Unverified	0
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination	Mar 26, 2024	ArticlesBenchmarking	—Unverified	0
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Mar 26, 2024	BenchmarkingMachine Reading Comprehension	CodeCode Available	1
Benchmarking Video Frame Interpolation	Mar 25, 2024	BenchmarkingComputational Efficiency	—Unverified	0
DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts	Mar 25, 2024	Benchmarking	—Unverified	0
NSINA: A News Corpus for Sinhala	Mar 25, 2024	ArticlesBenchmarking	CodeCode Available	0
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1
Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark	Mar 23, 2024	BenchmarkingImage to Point Cloud Registration	CodeCode Available	1
On the Fragility of Active Learners for Text Classification	Mar 23, 2024	Active LearningBenchmarking	CodeCode Available	0
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring	Mar 23, 2024	BenchmarkingText to SQL	CodeCode Available	0
Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation	Mar 22, 2024	BenchmarkingDeep Reinforcement Learning	—Unverified	0
Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards	Mar 22, 2024	Benchmarkingenergy management	—Unverified	0
Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes	Mar 22, 2024	Benchmarking	—Unverified	0
Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos	Mar 22, 2024	BenchmarkingTone Mapping	—Unverified	0
Can 3D Vision-Language Models Truly Understand Natural Language?	Mar 21, 2024	BenchmarkingDiversity	CodeCode Available	1
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models	Mar 21, 2024	BenchmarkingDocument Layout Analysis	CodeCode Available	1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations	Mar 21, 2024	BenchmarkingMemorization	CodeCode Available	1
ChatGPT Alternative Solutions: Large Language Models Survey	Mar 21, 2024	BenchmarkingChatbot	—Unverified	0
DomainLab: A modular Python package for domain generalization in deep learning	Mar 21, 2024	BenchmarkingDomain Generalization	CodeCode Available	1
Practical End-to-End Optical Music Recognition for Pianoform Music	Mar 20, 2024	Benchmarking	CodeCode Available	1
MARTA: a model for the automatic phonemic grouping of the parkinsonian speech	Mar 19, 2024	BenchmarkingClassification	CodeCode Available	0
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning	Mar 19, 2024	BenchmarkingImage Captioning	CodeCode Available	2
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection	Mar 19, 2024	Anomaly DetectionBenchmarking	CodeCode Available	3
MELTing point: Mobile Evaluation of Language Transformers	Mar 19, 2024	BenchmarkingQuantization	CodeCode Available	1
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework	Mar 19, 2024	BenchmarkingFinancial Analysis	CodeCode Available	3
ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems	Mar 19, 2024	Benchmarkingfeature selection	CodeCode Available	1
Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation	Mar 19, 2024	BenchmarkingSegmentation	—Unverified	0
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset	Mar 19, 2024	Action RecognitionBenchmarking	—Unverified	0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety	Mar 18, 2024	BenchmarkingMathematical Reasoning	—Unverified	0
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens	Mar 18, 2024	BenchmarkingQuestion Answering	CodeCode Available	1
Align and Distill: Unifying and Improving Domain Adaptive Object Detection	Mar 18, 2024	Benchmarkingobject-detection	CodeCode Available	1
Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks	Mar 18, 2024	BenchmarkingClassification	—Unverified	0
Benchmarking the Robustness of UAV Tracking Against Common Corruptions	Mar 18, 2024	Benchmarking	CodeCode Available	0
A Sober Look at the Robustness of CLIPs to Spurious Features	Mar 18, 2024	Benchmarking	—Unverified	0
FlowMind: Automatic Workflow Generation with LLMs	Mar 17, 2024	BenchmarkingQuestion Answering	—Unverified	0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking	Mar 17, 2024	BenchmarkingDialogue State Tracking	—Unverified	0
Depression Detection on Social Media with Large Language Models	Mar 16, 2024	BenchmarkingDepression Detection	—Unverified	0
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models	Mar 15, 2024	BenchmarkingDrug Discovery	CodeCode Available	1
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks	Mar 15, 2024	Adversarial AttackAdversarial Robustness	—Unverified	0
Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images	Mar 15, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	1
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study	Mar 15, 2024	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 48 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified