SOTAVerified

Benchmarking

Papers

Showing 55015525 of 5548 papers

TitleStatusHype
Classical ensemble of Quantum-classical ML algorithms for Phishing detection in Ethereum transaction networksCode0
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?Code0
TruthEval: A Dataset to Evaluate LLM Truthfulness and ReliabilityCode0
Technical Report on the CleverHans v2.1.0 Adversarial Examples LibraryCode0
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal KnowledgeCode0
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered EnvironmentCode0
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image EditingCode0
TSPP: A Unified Benchmarking Tool for Time-series ForecastingCode0
City-Scale Road Audit System using Deep LearningCode0
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shiftCode0
Advancing and Benchmarking Personalized Tool Invocation for LLMsCode0
CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban ComputingCode0
Chumor 2.0: Towards Benchmarking Chinese Humor UnderstandingCode0
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel BugsCode0
Benchmarking and Confidence Evaluation of LALMs For Temporal ReasoningCode0
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum SystemsCode0
Random Machines: A bagged-weighted support vector model with free kernel choiceCode0
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable QuestionsCode0
ChemSafetyBench: Benchmarking LLM Safety on Chemistry DomainCode0
Ranking and benchmarking framework for sampling algorithms on synthetic data streamsCode0
QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse MoleculesCode0
Tunability: Importance of Hyperparameters of Machine Learning AlgorithmsCode0
Temporal receptive field in dynamic graph learning: A comprehensive analysisCode0
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and InterpretabilityCode0
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified ModelCode0
Show:102550
← PrevPage 221 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified