Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2676–2700 of 5548 papers

Title	Date	Tasks	Status
A Comparative Analysis on Ethical Benchmarking in Large Language Models	Oct 11, 2024	BenchmarkingDecision Making	—Unverified
Identifying Money Laundering Subgraphs on the Blockchain	Oct 10, 2024	Benchmarking	CodeCode Available
Audio Explanation Synthesis with Generative Foundation Models	Oct 10, 2024	BenchmarkingDecision Making	CodeCode Available
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations	Oct 10, 2024	BenchmarkingDecision Making	CodeCode Available
Advocating Character Error Rate for Multilingual ASR Evaluation	Oct 9, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
InAttention: Linear Context Scaling for Transformers	Oct 9, 2024	BenchmarkingDecoder	—Unverified
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning	Oct 9, 2024	BenchmarkingFairness	CodeCode Available
TuringQ: Benchmarking AI Comprehension in Theory of Computation	Oct 9, 2024	Benchmarking	CodeCode Available
Analysis of different disparity estimation techniques on aerial stereo image datasets	Oct 9, 2024	BenchmarkingDepth Estimation	—Unverified
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB	Oct 9, 2024	BenchmarkingDiversity	—Unverified
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding	Oct 9, 2024	BenchmarkingInstruction Following	—Unverified
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes	Oct 9, 2024	BenchmarkingMotion Generation	—Unverified
Active Evaluation Acquisition for Efficient LLM Benchmarking	Oct 8, 2024	Benchmarking	—Unverified
Manual Verbalizer Enrichment for Few-Shot Text Classification	Oct 8, 2024	BenchmarkingClassification	—Unverified
Benchmarking of a new data splitting method on volcanic eruption data	Oct 8, 2024	Benchmarking	—Unverified
QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers	Oct 8, 2024	Benchmarking	CodeCode Available
Named Clinical Entity Recognition Benchmark	Oct 7, 2024	BenchmarkingDecoder	CodeCode Available
Precise Model Benchmarking with Only a Few Observations	Oct 7, 2024	Benchmarkingmodel	—Unverified
Rule-based Data Selection for Large Language Models	Oct 7, 2024	BenchmarkingMath	—Unverified
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Oct 7, 2024	BenchmarkingSegmentation	CodeCode Available
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems	Oct 7, 2024	BenchmarkingMachine Translation	—Unverified
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection	Oct 6, 2024	BenchmarkingMathematical Reasoning	—Unverified
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels	Oct 5, 2024	Benchmarking	—Unverified
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends	Oct 5, 2024	BenchmarkingChart Understanding	—Unverified

Show:10 25 50

← PrevPage 108 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified