SOTAVerified

Benchmarking

Papers

Showing 55015548 of 5548 papers

TitleStatusHype
Classical ensemble of Quantum-classical ML algorithms for Phishing detection in Ethereum transaction networksCode0
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?Code0
TruthEval: A Dataset to Evaluate LLM Truthfulness and ReliabilityCode0
Technical Report on the CleverHans v2.1.0 Adversarial Examples LibraryCode0
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal KnowledgeCode0
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered EnvironmentCode0
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image EditingCode0
TSPP: A Unified Benchmarking Tool for Time-series ForecastingCode0
City-Scale Road Audit System using Deep LearningCode0
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shiftCode0
Advancing and Benchmarking Personalized Tool Invocation for LLMsCode0
CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban ComputingCode0
Chumor 2.0: Towards Benchmarking Chinese Humor UnderstandingCode0
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel BugsCode0
Benchmarking and Confidence Evaluation of LALMs For Temporal ReasoningCode0
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum SystemsCode0
Random Machines: A bagged-weighted support vector model with free kernel choiceCode0
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable QuestionsCode0
ChemSafetyBench: Benchmarking LLM Safety on Chemistry DomainCode0
Ranking and benchmarking framework for sampling algorithms on synthetic data streamsCode0
QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse MoleculesCode0
Tunability: Importance of Hyperparameters of Machine Learning AlgorithmsCode0
Temporal receptive field in dynamic graph learning: A comprehensive analysisCode0
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and InterpretabilityCode0
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified ModelCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse CorruptionsCode0
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation ModelsCode0
RDF-star2Vec: RDF-star Graph Embeddings for Data MiningCode0
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object UnderstandingCode0
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing EnginesCode0
Characterizing SLAM Benchmarks and Methods for the Robust Perception AgeCode0
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-JudgeCode0
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional ChineseCode0
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPsCode0
TuringQ: Benchmarking AI Comprehension in Theory of ComputationCode0
An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variationsCode0
TweetNERD -- End to End Entity Linking Benchmark for TweetsCode0
Real-time cryo-EM data pre-processing with WarpCode0
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample DatasetsCode0
TextClass Benchmark: A Continuous Elo Rating of LLMs in Social SciencesCode0
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable ConfidenceCode0
Benchmarking Abstract and Reasoning Abilities Through A Theoretical PerspectiveCode0
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State EstimationCode0
ACCORD: Closing the Commonsense Measurability GapCode0
Benchmarking 6DOF Outdoor Visual Localization in Changing ConditionsCode0
Benchmark Generation Framework with Customizable Distortions for Image Classifier RobustnessCode0
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxCode0
Show:102550
← PrevPage 111 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified