SOTAVerified

Benchmarking

Papers

Showing 48764900 of 5548 papers

TitleStatusHype
MST: Adaptive Multi-Scale Tokens Guided Interactive SegmentationCode0
ferret: a Framework for Benchmarking Explainers on TransformersCode0
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on TurkishCode0
FEET: A Framework for Evaluating Embedding TechniquesCode0
Benchmarking Probabilistic Deep Learning Methods for License Plate RecognitionCode0
Unraveling the Capabilities of Language Models in News SummarizationCode0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at ScaleCode0
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing TasksCode0
MUBen: Benchmarking the Uncertainty of Molecular Representation ModelsCode0
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event DetectionCode0
WAC: A Corpus of Wikipedia Conversations for Online Abuse DetectionCode0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMsCode0
Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine LearningCode0
Feature interpretability in BCIs: exploring the role of network lateralizationCode0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
Benchmarking pre-trained text embedding models in aligning built asset informationCode0
Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared TaskCode0
Feature embedding in click-through rate predictionCode0
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural NetworksCode0
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human FeedbackCode0
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative AnalysisCode0
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information RetrievalCode0
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature MovementsCode0
Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection ModelsCode0
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text SpottingCode0
Show:102550
← PrevPage 196 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified