SOTAVerified

Benchmarking

Papers

Showing 26762700 of 5548 papers

TitleStatusHype
A Comparative Analysis on Ethical Benchmarking in Large Language Models0
Identifying Money Laundering Subgraphs on the BlockchainCode0
Audio Explanation Synthesis with Generative Foundation ModelsCode0
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty SimulationsCode0
Advocating Character Error Rate for Multilingual ASR Evaluation0
InAttention: Linear Context Scaling for Transformers0
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated LearningCode0
TuringQ: Benchmarking AI Comprehension in Theory of ComputationCode0
Analysis of different disparity estimation techniques on aerial stereo image datasets0
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB0
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding0
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes0
Active Evaluation Acquisition for Efficient LLM Benchmarking0
Manual Verbalizer Enrichment for Few-Shot Text Classification0
Benchmarking of a new data splitting method on volcanic eruption data0
QGym: Scalable Simulation and Benchmarking of Queuing Network ControllersCode0
Named Clinical Entity Recognition BenchmarkCode0
Precise Model Benchmarking with Only a Few Observations0
Rule-based Data Selection for Large Language Models0
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation ModelsCode0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems0
Adjusting Pretrained Backbones for PerformativityCode0
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection0
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels0
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends0
Show:102550
← PrevPage 108 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified