SOTAVerified

Benchmarking

Papers

Showing 17811790 of 5548 papers

TitleStatusHype
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance0
SPINEX-TimeSeries: Similarity-based Predictions with Explainable Neighbors Exploration for Time Series and Forecasting Problems0
Visual-Inertial SLAM for Unstructured Outdoor Environments: Benchmarking the Benefits and Computational Costs of Loop ClosingCode0
Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test DataCode0
Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations0
IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model0
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality MetricsCode1
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory InstructionsCode0
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
PINNs for Medical Image Analysis: A Survey0
Show:102550
← PrevPage 179 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified