SOTAVerified

Benchmarking

Papers

Showing 18761900 of 5548 papers

TitleStatusHype
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation0
Advancing and Benchmarking Personalized Tool Invocation for LLMsCode0
False Promises in Medical Imaging AI? Assessing Validity of Outperformance ClaimsCode0
Alpha Excel Benchmark0
Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power TransformersCode0
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?Code0
Call for Action: towards the next generation of symbolic regression benchmark0
Multimodal Benchmarking and Recommendation of Text-to-Image Generation ModelsCode0
Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding ApproachCode0
MedArabiQ: Benchmarking Large Language Models on Arabic Medical TasksCode0
Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning0
NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealitiesCode0
Completing Spatial Transcriptomics Data for Gene Expression Prediction Benchmarking0
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive SegmentationCode0
Meta-Black-Box-Optimization through Offline Q-function LearningCode0
Representation Learning of Limit Order Book: A Comprehensive Study and BenchmarkingCode0
NbBench: Benchmarking Language Models for Comprehensive Nanobody TasksCode0
Not Every Tree Is a Forest: Benchmarking Forest Types from Satellite Remote Sensing0
CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture0
BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models0
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach0
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey0
Interpretable graph-based models on multimodal biomedical data integration: A technical review and benchmarking0
Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language ModelsCode0
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging0
Show:102550
← PrevPage 76 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified