SOTAVerified

Benchmarking

Papers

Showing 20012025 of 5548 papers

TitleStatusHype
Leveraging State Space Models in Long Range Genomics0
Cross-functional transferability in universal machine learning interatomic potentials0
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking0
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search0
Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image CompressionCode0
Riemannian Geometry for the classification of brain states with intracortical brain-computer interfaces0
A Solid-State Nanopore Signal Generator for Training Machine Learning Models0
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation ModelsCode0
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIsCode0
Towards Visual Text Grounding of Multimodal Large Language Model0
Do LLM Evaluators Prefer Themselves for a Reason?Code0
Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological UnderpinningsCode0
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams0
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models0
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins0
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical SystemsCode0
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge0
Evaluating AI Recruitment Sourcing Tools by Human PreferenceCode0
Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries0
Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms0
Horizon Scans can be accelerated using novel information retrieval and artificial intelligence tools0
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks0
FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking0
Show:102550
← PrevPage 81 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified