SOTAVerified

Benchmarking

Papers

Showing 21512175 of 5548 papers

TitleStatusHype
Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances0
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark0
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume0
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion0
A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect0
Benchmarking Active Learning Strategies for Materials Optimization and Discovery0
Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos0
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms0
Benchmarking Active Learning for NILM0
Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation0
Analysing Features Learned Using Unsupervised Models on Program Embeddings0
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content0
Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data0
Analysing Errors of Open Information Extraction Systems0
Exploring Capabilities of Time Series Foundation Models in Building Analytics0
Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization0
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking0
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles0
Exploring Continual Learning of Diffusion Models0
Benchmarking a Benchmark: How Reliable is MS-COCO?0
A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management0
Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents0
A new pathway to generative artificial intelligence by minimizing the maximum entropy0
Show:102550
← PrevPage 87 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified