SOTAVerified

Benchmarking

Papers

Showing 15011525 of 5548 papers

TitleStatusHype
When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs LearningCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks0
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment0
Guidelines for Fine-grained Sentence-level Arabic Readability Annotation0
Test-driven Software Experimentation with LASSO: an LLM Prompt Benchmarking Example0
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty SimulationsCode0
Identifying Money Laundering Subgraphs on the BlockchainCode0
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
Benchmarking Agentic Workflow GenerationCode2
Audio Explanation Synthesis with Generative Foundation ModelsCode0
Advocating Character Error Rate for Multilingual ASR Evaluation0
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated LearningCode0
Towards Generalisable Time Series Understanding Across DomainsCode1
Analysis of different disparity estimation techniques on aerial stereo image datasets0
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB0
TuringQ: Benchmarking AI Comprehension in Theory of ComputationCode0
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding0
InAttention: Linear Context Scaling for Transformers0
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes0
Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and BeyondCode2
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
FedGraph: A Research Library and Benchmark for Federated Graph LearningCode2
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person PerspectiveCode1
Manual Verbalizer Enrichment for Few-Shot Text Classification0
Show:102550
← PrevPage 61 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified