SOTAVerified

Benchmarking

Papers

Showing 19411950 of 5548 papers

TitleStatusHype
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
Quantum-tunnelling deep neural network for optical illusion recognition0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis0
GenRL: Multimodal-foundation world models for generalization in embodied agentsCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems0
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making0
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark DetectionCode1
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)Code1
Show:102550
← PrevPage 195 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified