Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1526–1550 of 5548 papers

Title	Date	Tasks	Status	Hype
Active Evaluation Acquisition for Efficient LLM Benchmarking	Oct 8, 2024	Benchmarking	—Unverified	0
Manual Verbalizer Enrichment for Few-Shot Text Classification	Oct 8, 2024	BenchmarkingClassification	—Unverified	0
Benchmarking of a new data splitting method on volcanic eruption data	Oct 8, 2024	Benchmarking	—Unverified	0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems	Oct 7, 2024	BenchmarkingMachine Translation	—Unverified	0
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	Oct 7, 2024	BenchmarkingMixture-of-Experts	CodeCode Available	1
Rule-based Data Selection for Large Language Models	Oct 7, 2024	BenchmarkingMath	—Unverified	0
Precise Model Benchmarking with Only a Few Observations	Oct 7, 2024	Benchmarkingmodel	—Unverified	0
MIBench: A Comprehensive Framework for Benchmarking Model Inversion Attack and Defense	Oct 7, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	2
Named Clinical Entity Recognition Benchmark	Oct 7, 2024	BenchmarkingDecoder	CodeCode Available	0
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Oct 7, 2024	BenchmarkingSegmentation	CodeCode Available	0
Large Scale MRI Collection and Segmentation of Cirrhotic Liver	Oct 6, 2024	BenchmarkingDiagnostic	CodeCode Available	1
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection	Oct 6, 2024	BenchmarkingMathematical Reasoning	—Unverified	0
dattri: A Library for Efficient Data Attribution	Oct 6, 2024	Benchmarking	CodeCode Available	2
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available	0
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends	Oct 5, 2024	BenchmarkingChart Understanding	—Unverified	0
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms	Oct 5, 2024	BenchmarkingGPU	—Unverified	0
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning	Oct 5, 2024	BenchmarkingDrug Design	CodeCode Available	1
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels	Oct 5, 2024	Benchmarking	—Unverified	0
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions	Oct 5, 2024	BenchmarkingHallucination	CodeCode Available	0
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension	Oct 4, 2024	BenchmarkingComputational chemistry	—Unverified	0
ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities	Oct 4, 2024	Benchmarkingcounterfactual	—Unverified	0
Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices	Oct 4, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Benchmarking the Fidelity and Utility of Synthetic Relational Data	Oct 4, 2024	BenchmarkingFeature Importance	—Unverified	0
PersoBench: Benchmarking Personalized Response Generation in Large Language Models	Oct 4, 2024	BenchmarkingDialogue Generation	CodeCode Available	0
Ward: Provable RAG Dataset Inference via LLM Watermarks	Oct 4, 2024	BenchmarkingRAG	—Unverified	0

Show:10 25 50

← PrevPage 62 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified