Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2376–2400 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations	Mar 21, 2024	BenchmarkingMemorization	CodeCode Available	1
ChatGPT Alternative Solutions: Large Language Models Survey	Mar 21, 2024	BenchmarkingChatbot	—Unverified	0
DomainLab: A modular Python package for domain generalization in deep learning	Mar 21, 2024	BenchmarkingDomain Generalization	CodeCode Available	1
Practical End-to-End Optical Music Recognition for Pianoform Music	Mar 20, 2024	Benchmarking	CodeCode Available	1
MARTA: a model for the automatic phonemic grouping of the parkinsonian speech	Mar 19, 2024	BenchmarkingClassification	CodeCode Available	0
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning	Mar 19, 2024	BenchmarkingImage Captioning	CodeCode Available	2
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection	Mar 19, 2024	Anomaly DetectionBenchmarking	CodeCode Available	3
MELTing point: Mobile Evaluation of Language Transformers	Mar 19, 2024	BenchmarkingQuantization	CodeCode Available	1
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework	Mar 19, 2024	BenchmarkingFinancial Analysis	CodeCode Available	3
ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems	Mar 19, 2024	Benchmarkingfeature selection	CodeCode Available	1
Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation	Mar 19, 2024	BenchmarkingSegmentation	—Unverified	0
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset	Mar 19, 2024	Action RecognitionBenchmarking	—Unverified	0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety	Mar 18, 2024	BenchmarkingMathematical Reasoning	—Unverified	0
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens	Mar 18, 2024	BenchmarkingQuestion Answering	CodeCode Available	1
Align and Distill: Unifying and Improving Domain Adaptive Object Detection	Mar 18, 2024	Benchmarkingobject-detection	CodeCode Available	1
Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks	Mar 18, 2024	BenchmarkingClassification	—Unverified	0
Benchmarking the Robustness of UAV Tracking Against Common Corruptions	Mar 18, 2024	Benchmarking	CodeCode Available	0
A Sober Look at the Robustness of CLIPs to Spurious Features	Mar 18, 2024	Benchmarking	—Unverified	0
FlowMind: Automatic Workflow Generation with LLMs	Mar 17, 2024	BenchmarkingQuestion Answering	—Unverified	0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking	Mar 17, 2024	BenchmarkingDialogue State Tracking	—Unverified	0
Depression Detection on Social Media with Large Language Models	Mar 16, 2024	BenchmarkingDepression Detection	—Unverified	0
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models	Mar 15, 2024	BenchmarkingDrug Discovery	CodeCode Available	1
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks	Mar 15, 2024	Adversarial AttackAdversarial Robustness	—Unverified	0
Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images	Mar 15, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	1
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study	Mar 15, 2024	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 96 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified