SOTAVerified

Benchmarking

Papers

Showing 15011510 of 5548 papers

TitleStatusHype
When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs LearningCode1
Test-driven Software Experimentation with LASSO: an LLM Prompt Benchmarking Example0
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment0
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks0
Guidelines for Fine-grained Sentence-level Arabic Readability Annotation0
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty SimulationsCode0
Identifying Money Laundering Subgraphs on the BlockchainCode0
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
Benchmarking Agentic Workflow GenerationCode2
Show:102550
← PrevPage 151 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified