SOTAVerified

Benchmarking

Papers

Showing 701710 of 5548 papers

TitleStatusHype
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRTCode1
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video GenerationCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark FrameworkCode1
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly DetectionCode1
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
QGEval: Benchmarking Multi-dimensional Evaluation for Question GenerationCode1
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language ModelsCode1
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular dockingCode1
ICU-Sepsis: A Benchmark MDP Built from Real Medical DataCode1
Show:102550
← PrevPage 71 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified