SOTAVerified

Benchmarking

Papers

Showing 671680 of 5548 papers

TitleStatusHype
Benchmark on Drug Target Interaction Modeling from a Structure PerspectiveCode1
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language ModelsCode1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking DatasetCode1
Occlusion-Aware Seamless SegmentationCode1
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
Overcoming Common Flaws in the Evaluation of Selective Classification SystemsCode1
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile AgentsCode1
AI Agents That MatterCode1
GraphArena: Benchmarking Large Language Models on Graph Computational ProblemsCode1
Show:102550
← PrevPage 68 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified