SOTAVerified

Benchmarking

Papers

Showing 24012450 of 5548 papers

TitleStatusHype
DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action RecognitionCode0
Ducho meets Elliot: Large-scale Benchmarks for Multimodal RecommendationCode0
GRATIS: GeneRAting TIme Series with diverse and controllable characteristicsCode0
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes ProsthesisCode0
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?Code0
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral PerspectiveCode0
Graph-theoretical approach to robust 3D normal extraction of LiDAR dataCode0
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise AnnotationsCode0
Benchmarking Learning Efficiency in Deep Reservoir ComputingCode0
Learning Conjoint Attentions for Graph Neural NetsCode0
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and BenchmarkingCode0
DQI: Measuring Data Quality in NLPCode0
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive EvaluationCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
A General Benchmarking Framework for Text GenerationCode0
GOAL: Towards Benchmarking Few-Shot Sports Game SummarizationCode0
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation dataCode0
Separating form and meaning: Using self-consistency to quantify task understanding across multiple sensesCode0
GNNMerge: Merging of GNN Models Without Accessing Training DataCode0
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and MetricCode0
Benchmarking Large Language Model Uncertainty for Prompt OptimizationCode0
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural NetworksCode0
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue SystemsCode0
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph ColoringCode0
Evaluating the Transferability of Machine-Learned Force Fields for Material Property ModelingCode0
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet ExtractionCode0
Arena-Rosnav 2.0: A Development and Benchmarking Platform for Robot Navigation in Highly Dynamic EnvironmentsCode0
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural NetworksCode0
Learn How to Query from Unlabeled Data Streams in Federated LearningCode0
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual TrackingCode0
Geological Inference from Textual Data using Word EmbeddingsCode0
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree searchCode0
Domain2Vec: Domain Embedding for Unsupervised Domain AdaptationCode0
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two BenchmarksCode0
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1MCode0
Do LLM Evaluators Prefer Themselves for a Reason?Code0
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and ReasoningCode0
Flexible Generation of Preference Data for Recommendation AnalysisCode0
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and DatasetCode0
Graph Convolutional Networks Meet with High Dimensionality ReductionCode0
Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific AbstractsCode0
Strong and Simple Baselines for Multimodal Utterance EmbeddingsCode0
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion ColliderCode0
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician ExamsCode0
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language ModelsCode0
Benchmarking Large Language Models for Math Reasoning TasksCode0
Benchmarking Large Language Models for Image Classification of Marine MammalsCode0
Divergent Creativity in Humans and Large Language ModelsCode0
Generalization and Regularization in DQNCode0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal DataCode0
Show:102550
← PrevPage 49 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified