SOTAVerified

Benchmarking

Papers

Showing 14011425 of 5548 papers

TitleStatusHype
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
PC-Gym: Benchmark Environments For Process Control ProblemsCode2
Image2Struct: Benchmarking Structure Extraction for Vision-Language Models0
SS3DM: Benchmarking Street-View Surface Reconstruction with a Synthetic 3D Mesh Dataset0
AI Cyber Risk Benchmark: Automated Exploitation Capabilities0
Benchmarking LLM Guardrails in Handling Multilingual Toxicity0
Benchmarking Human and Automated Prompting in the Segment Anything ModelCode0
Exploring Capabilities of Time Series Foundation Models in Building Analytics0
Project MPG: towards a generalized performance benchmark for LLM capabilities0
LLMCBench: Benchmarking Large Language Model Compression for Efficient DeploymentCode1
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual UpdatesCode0
CODES: Benchmarking Coupled ODE SurrogatesCode0
ODRL: A Benchmark for Off-Dynamics Reinforcement LearningCode2
LLM-initialized Differentiable Causal Discovery0
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training0
Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
CURATe: Benchmarking Personalised Alignment of Conversational AI AssistantsCode0
BongLLaMA: LLaMA for Bangla Language0
Sequential Large Language Model-Based Hyper-parameter OptimizationCode0
SPICEPilot: Navigating SPICE Code Generation and Simulation with AI GuidanceCode1
Multi-input Multi-output Loewner Framework for Vibration-based Damage Detection on a Trainer Jet0
OGBench: Benchmarking Offline Goal-Conditioned RLCode3
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance LabelsCode0
SFTrack: A Robust Scale and Motion Adaptive Algorithm for Tracking Small and Fast Moving Objects0
Show:102550
← PrevPage 57 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified