SOTAVerified

Benchmarking

Papers

Showing 15511560 of 5548 papers

TitleStatusHype
Ward: Provable RAG Dataset Inference via LLM Watermarks0
Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning0
AutoPenBench: Benchmarking Generative Agents for Penetration TestingCode2
Towards a Benchmark for Large Language Models for Business Process Management TasksCode0
Repurposing Foundation Model for Generalizable Medical Time Series Classification0
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference ServicesCode1
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and ObjectsCode1
Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning0
MANTRA: The Manifold Triangulations AssemblageCode0
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models0
Show:102550
← PrevPage 156 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified