SOTAVerified

Benchmarking

Papers

Showing 50015050 of 5548 papers

TitleStatusHype
Enhancing 3D-Air Signature by Pen Tip Tail Trajectory Awareness: Dataset and Featuring by Novel Spatio-temporal CNNCode0
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing RisksCode0
Asynchronous Batch Bayesian Optimization with Pipelining Evaluations for Experimental Resourcex2013constrained ConditionsCode0
NeuroMorse: A Temporally Structured Dataset For Neuromorphic ComputingCode0
NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealitiesCode0
EnergyStar++: Towards more accurate and explanatory building energy benchmarkingCode0
Accelerating Large-Scale Inference with Anisotropic Vector QuantizationCode0
A survey of probabilistic generative frameworks for molecular simulationsCode0
Benchmarking neural embeddings for link prediction in knowledge graphs under semantic and structural changesCode0
EmProx: Neural Network Performance Estimation For Neural Architecture SearchCode0
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual UpdatesCode0
A comparison of translation performance between DeepL and SupertextCode0
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation FrameworkCode0
Benchmarking Multimodal CoT Reward Model Stepwise by Visual ProgramCode0
Benchmarking Machine Translation with Cultural AwarenessCode0
Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageCode0
Unsupervised Tracklet Person Re-IdentificationCode0
Empirical Study of Off-Policy Policy Evaluation for Reinforcement LearningCode0
TMPNN: High-Order Polynomial Regression Based on Taylor Map FactorizationCode0
Nmbr9 as a Constraint Programming ChallengeCode0
EFSA: Towards Event-Level Financial Sentiment AnalysisCode0
Efficient, Uncertainty-based Moderation of Neural Networks Text ClassifiersCode0
Efficient Realistic Data Generation Framework leveraging Deep Learning-based Human DigitizationCode0
Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific LeaderboardsCode0
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop ReasoningCode0
Benchmarking multi-component signal processing methods in the time-frequency planeCode0
Efficiently solving the thief orienteering problem with a max-min ant colony optimization approachCode0
A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-offCode0
Benchmarking MOEAs for solving continuous multi-objective RL problemsCode0
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity RecognitionCode0
Benchmarking Model-Based Reinforcement LearningCode0
Benchmarking Misuse Mitigation Against Covert AdversariesCode0
To Find Waldo You Need Contextual Cues: Debiasing Who's WaldoCode0
Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction MethodsCode0
No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning DatasetsCode0
To Find Waldo You Need Contextual Cues: Debiasing Who’s WaldoCode0
AstroVision: Towards Autonomous Feature Detection and Description for Missions to Small Bodies Using Deep LearningCode0
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchardsCode0
ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden StatesCode0
NorEval: A Norwegian Language Understanding and Generation Evaluation BenchmarkCode0
A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video SummarizationCode0
Assigning Species Information to Corresponding Genes by a Sequence Labeling FrameworkCode0
ASR Benchmarking: Need for a More Representative Conversational DatasetCode0
Benchmarking missing-values approaches for predictive models on health databasesCode0
SignalGP-Lite: Event Driven Genetic Programming Library for Large-Scale Artificial Life ApplicationsCode0
Benchmarking Minimax LinkageCode0
Efficient and Effective Model ExtractionCode0
Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language RecognitionCode0
signSGD with Majority Vote is Communication Efficient And Fault TolerantCode0
To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User InteractionsCode0
Show:102550
← PrevPage 101 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified