SOTAVerified

Benchmarking

Papers

Showing 34013450 of 5548 papers

TitleStatusHype
Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data0
TOTOPO: Classifying univariate and multivariate time series with Topological Data Analysis0
LMFormer: Lane based Motion Prediction Transformer0
Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record De-identification0
LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs0
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models0
Load-independent Metrics for Benchmarking Force Controllers0
Benchmarking Mobile Device Control Agents across Diverse Configurations0
Local Data Quantity-Aware Weighted Averaging for Federated Learning with Dishonest Clients0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis0
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework0
Benchmarking Middle-Trained Language Models for Neural Search0
Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture0
Logically at Factify 2022: Multimodal Fact Verification0
Toward an ImageNet Library of Functions for Global Optimization Benchmarking0
Benchmarking Meta-heuristic Optimization0
Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models0
Toward end-to-end interpretable convolutional neural networks for waveform signals0
Benchmarking MedMNIST dataset on real quantum hardware0
Benchmarking Machine Translated Sentiment Analysis for Arabic Tweets0
Benchmarking Continuous Time Models for Predicting Multiple Sclerosis Progression0
Benchmarking Machine Learning Robustness in Covid-19 Spike Sequence Classification0
Benchmarking Machine Learning Models to Predict Corporate Bankruptcy0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation0
Long Range Arena : A Benchmark for Efficient Transformers0
Benchmarking machine learning models for predicting aerofoil performance0
Benchmarking Machine Learning Models for Quantum Error Correction0
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models0
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage0
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning0
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking0
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers0
Benchmarking machine learning models for quantum state classification0
Towards a Benchmark for Scientific Understanding in Humans and Machines0
Benchmarking Machine Learning Methods for Distributed Acoustic Sensing0
Benchmarking Machine Learning: How Fast Can Your Algorithms Go?0
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym0
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors0
Low-Density 3D Point Cloud Classification0
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French0
LSTM-based Whisper Detection0
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives0
LucidDreaming: Controllable Object-Centric 3D Generation0
Benchmarking LLMs on the Semantic Overlap Summarization Task0
LUND-PROBE -- LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset0
Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders0
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving0
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data0
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes0
Show:102550
← PrevPage 69 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified