SOTAVerified

Benchmarking

Papers

Showing 44014450 of 5548 papers

TitleStatusHype
Reinforcement Learning Based Handwritten Digit Recognition with Two-State Q-Learning0
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression0
Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research0
Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse0
Reinforcing Competitive Multi-Agents for Playing So Long Sucker0
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering0
Relative Afferent Pupillary Defect Screening through Transfer Learning0
A Survey of Parameters Associated with the Quality of Benchmarks in NLP0
Reliable validation of Reinforcement Learning Benchmarks0
Why every GBDT speed benchmark is wrong0
REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models0
A Survey of Model Compression and Acceleration for Deep Neural Networks0
A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing0
Removal of Ocular Artifacts in EEG Using Deep Learning0
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques0
Removing Multiple Hybrid Adverse Weather in Video via a Unified Model0
A survey of benchmarking frameworks for reinforcement learning0
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training0
REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning0
A Collection of Challenging Optimization Problems in Science, Engineering and Economics0
A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews0
Why is the winner the best?0
A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives0
Unreal Robotics Lab: A High-Fidelity Robotics Simulator with Advanced Physics and Rendering0
Reproducible evaluation of classification methods in Alzheimer's disease: framework and application to MRI and PET data0
Repurposing Foundation Model for Generalizable Medical Time Series Classification0
Reradiation and Scattering from a Reconfigurable Intelligent Surface: A General Macroscopic Model0
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI0
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness0
ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents0
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition0
ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies0
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code0
Reservoir Computing with a Single Oscillating Gas Bubble: Emphasizing the Chaotic Regime0
Resistive Neural Hardware Accelerators0
Resource-efficient Medical Image Analysis with Self-adapting Forward-Forward Networks0
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images0
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation0
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go0
A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems0
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy0
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets0
A Statistical Framework to Investigate the Optimality of Signal-Reconstruction Methods0
Rethinking Pareto Frontier for Performance Evaluation of Deep Neural Networks0
Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes0
Unsupervised Feature Learning for Environmental Sound Classification Using Weighted Cycle-Consistent Generative Adversarial Network0
A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: How Many Repeats Are Enough?0
A Standardized Benchmark Set of Clustering Problem Instances for Comparing Black-Box Optimizers0
Show:102550
← PrevPage 89 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified