SOTAVerified

Benchmarking

Papers

Showing 45514600 of 5548 papers

TitleStatusHype
Salient Object Detection: A Benchmark0
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models0
SAM-based instance segmentation models for the automation of structural damage detection0
A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning0
Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics0
Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection0
SASSE: Scalable and Adaptable 6-DOF Pose Estimation0
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas0
Wildfire Forecasting with Satellite Images and Deep Generative Model0
User Profile with Large Language Models: Construction, Updating, and Benchmarking0
SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing0
Scaffold Splits Overestimate Virtual Screening Performance0
Scalable and Customizable Benchmark Problems for Many-Objective Optimization0
Scalable and Hybrid Ensemble-Based Causality Discovery0
ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection0
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency0
ARBiBench: Benchmarking Adversarial Robustness of Binarized Neural Networks0
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects0
Scalable Psychological Momentum Forecasting in Esports0
Using Affine Combinations of BBOB Problems for Performance Assessment0
Using generative adversarial networks to synthesize artificial financial datasets0
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis0
Using Multi-Temporal Sentinel-1 and Sentinel-2 data for water bodies mapping0
Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT0
Using Neural Architecture Search for Improving Software Flaw Detection in Multimodal Deep Learning Models0
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP0
ScanNeRF: a Scalable Benchmark for Neural Radiance Fields0
SCBench: A Sports Commentary Benchmark for Video LLMs0
AraBench: Benchmarking Dialectal Arabic-English Machine Translation0
Using PCA to Efficiently Represent State Spaces0
Scenarios and Approaches for Situated Natural Language Explanations0
A quantitative method for benchmarking fair income distribution0
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video0
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs0
Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures0
A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images0
A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection0
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement0
Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers0
Scientific Machine Learning Benchmarks0
Using Well-Understood Single-Objective Functions in Multiobjective Black-Box Optimization Test Suites0
uTHCD: A New Benchmarking for Tamil Handwritten OCR0
A practical generalization metric for deep networks benchmarking0
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models0
Approaches for benchmarking single-cell gene regulatory network inference methods0
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models0
Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts0
Application of Machine Learning for Online Reputation Systems0
Utility-Optimized Synthesis of Differentially Private Location Traces0
scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection0
Show:102550
← PrevPage 92 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified