SOTAVerified

Benchmarking

Papers

Showing 44264450 of 5548 papers

TitleStatusHype
Repurposing Foundation Model for Generalizable Medical Time Series Classification0
Reradiation and Scattering from a Reconfigurable Intelligent Surface: A General Macroscopic Model0
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI0
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness0
ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents0
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition0
ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies0
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code0
Reservoir Computing with a Single Oscillating Gas Bubble: Emphasizing the Chaotic Regime0
Resistive Neural Hardware Accelerators0
Resource-efficient Medical Image Analysis with Self-adapting Forward-Forward Networks0
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images0
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation0
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go0
A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems0
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy0
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets0
A Statistical Framework to Investigate the Optimality of Signal-Reconstruction Methods0
Rethinking Pareto Frontier for Performance Evaluation of Deep Neural Networks0
Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes0
Unsupervised Feature Learning for Environmental Sound Classification Using Weighted Cycle-Consistent Generative Adversarial Network0
A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: How Many Repeats Are Enough?0
A Standardized Benchmark Set of Clustering Problem Instances for Comparing Black-Box Optimizers0
Show:102550
← PrevPage 178 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified