SOTAVerified

Benchmarking

Papers

Showing 12511300 of 5548 papers

TitleStatusHype
Chaos as an interpretable benchmark for forecasting and data-driven modellingCode1
SERAB: A multi-lingual benchmark for speech emotion recognitionCode1
EntQA: Entity Linking as Question AnsweringCode1
Revisiting Self-Training for Few-Shot Learning of Language ModelCode1
Machine Learning with Knowledge Constraints for Process Optimization of Open-Air Perovskite Solar Cell ManufacturingCode1
Phonetic Word EmbeddingsCode1
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated EvaluationCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken ConversationsCode1
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language UnderstandingCode1
PASS: An ImageNet replacement for self-supervised pretraining without humansCode1
Disentangled Feature Representation for Few-shot Image ClassificationCode1
Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and BenchmarkingCode1
AI Accelerator Survey and TrendsCode1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge GraphsCode1
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation DatasetCode1
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle CommunicationCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
RobustART: Benchmarking Robustness on Architecture Design and Training TechniquesCode1
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through LexicaCode1
Scikit-dimension: a Python package for intrinsic dimension estimationCode1
Biomedical Data-to-Text Generation via Fine-Tuning TransformersCode1
ReMeDi: Resources for Multi-domain, Multi-service, Medical DialoguesCode1
Semi-Supervised Exaggeration Detection of Health Science Press ReleasesCode1
Tune It or Don't Use It: Benchmarking Data-Efficient Image ClassificationCode1
KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learningCode1
Searching for an Effective Defender: Benchmarking Defense against Adversarial Word SubstitutionCode1
Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training DebiasingCode1
A Unified Taxonomy and Multimodal Dataset for Events in Invasion GamesCode1
Generative Wind Power Curve Modeling Via Machine Vision: A Self-learning Deep Convolutional Network Based MethodCode1
SSH: A Self-Supervised Framework for Image HarmonizationCode1
A Dataset for Answering Time-Sensitive QuestionsCode1
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based HateCode1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image AnalysisCode1
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An ApproachCode1
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation AlgorithmsCode1
Quantum machine learning of large datasets using randomized measurementsCode1
Benchmarking: Past, Present and FutureCode1
Contemporary Symbolic Regression Methods and their Relative PerformanceCode1
A multi-schematic classifier-independent oversampling approach for imbalanced datasetsCode1
Hierarchical graph neural nets can capture long-range interactionsCode1
Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiersCode1
MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity RecognitionCode1
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERTCode1
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning AlgorithmsCode1
The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic ClassificationCode1
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement LearningCode1
Benchmarking Knowledge-driven Zero-shot LearningCode1
Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot SystemsCode1
Show:102550
← PrevPage 26 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified