SOTAVerified

Benchmarking

Papers

Showing 48514900 of 5548 papers

TitleStatusHype
Mol-MoE: Training Preference-Guided Routers for Molecule GenerationCode0
Benchmarking Robust Self-Supervised Learning Across Diverse Downstream TasksCode0
Fine-grained Hand Gesture Recognition in Multi-viewpoint Hand HygieneCode0
Moment Matching for Multi-Source Domain AdaptationCode0
Benchmarking Robustness to Text-Guided CorruptionsCode0
Fine-grained Entity Recognition with Reduced False Negatives and Large Type CoverageCode0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted DataCode0
Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous DrivingCode0
Scission: Performance-driven and Context-aware Cloud-Edge Distribution of Deep Neural NetworksCode0
ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profilesCode0
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is ComingCode0
Motley: Benchmarking Heterogeneity and Personalization in Federated LearningCode0
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context LearningCode0
Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease GeneralizationCode0
The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMACode0
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMsCode0
Benchmarking Representation Learning for Natural World Image CollectionsCode0
Benchmarking Reinforcement Learning Algorithms on Real-World RobotsCode0
Benchmarking Quantum Reinforcement LearningCode0
MSAMSum: Towards Benchmarking Multi-lingual Dialogue SummarizationCode0
Alchemy: A Quantum Chemistry Dataset for Benchmarking AI ModelsCode0
FHBench: Towards Efficient and Personalized Federated Learning for Multimodal HealthcareCode0
Benchmarking quantum machine learning kernel training for classification tasksCode0
The Saudi Privacy Policy DatasetCode0
MST: Adaptive Multi-Scale Tokens Guided Interactive SegmentationCode0
ferret: a Framework for Benchmarking Explainers on TransformersCode0
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on TurkishCode0
FEET: A Framework for Evaluating Embedding TechniquesCode0
Benchmarking Probabilistic Deep Learning Methods for License Plate RecognitionCode0
Unraveling the Capabilities of Language Models in News SummarizationCode0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at ScaleCode0
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing TasksCode0
MUBen: Benchmarking the Uncertainty of Molecular Representation ModelsCode0
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event DetectionCode0
WAC: A Corpus of Wikipedia Conversations for Online Abuse DetectionCode0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMsCode0
Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine LearningCode0
Feature interpretability in BCIs: exploring the role of network lateralizationCode0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
Benchmarking pre-trained text embedding models in aligning built asset informationCode0
Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared TaskCode0
Feature embedding in click-through rate predictionCode0
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural NetworksCode0
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human FeedbackCode0
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative AnalysisCode0
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information RetrievalCode0
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature MovementsCode0
Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection ModelsCode0
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text SpottingCode0
Show:102550
← PrevPage 98 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified