SOTAVerified

Benchmarking

Papers

Showing 38513900 of 5548 papers

TitleStatusHype
RISEdb: a Novel Indoor Localization Dataset0
Risk Aware Benchmarking of Large Language Models0
Risk-Neutral Generative Networks0
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations0
RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies0
RNAmountAlign: efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment0
A Comprehensive Guide to CAN IDS Data & Introduction of the ROAD Dataset0
ROBBIE: Robust Bias Evaluation of Large Generative Language Models0
OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
Robust 2D/3D Vehicle Parsing in CVIS0
Robust measurement of innovation performances in Europe with a hierarchy of interacting composite indicators0
Robust Medical Instrument Segmentation Challenge 20190
RobustMQ: Benchmarking Robustness of Quantized Models0
Robustness of Reinforcement Learning-Based Traffic Signal Control under Incidents: A Comparative Study0
Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks0
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo0
Robust Vision Challenge 2020 -- 1st Place Report for Panoptic Segmentation0
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands0
RRSIS: Referring Remote Sensing Image Segmentation0
RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark0
Rule-based Data Selection for Large Language Models0
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy0
Sadeed: Advancing Arabic Diacritization Through Small Language Model0
Safe Load Balancing in Software-Defined-Networking0
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks0
SAIBench: Benchmarking AI for Science0
Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics0
Salient Object Detection: A Benchmark0
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models0
SAM-based instance segmentation models for the automation of structural damage detection0
Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection0
SASSE: Scalable and Adaptable 6-DOF Pose Estimation0
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas0
SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing0
Scaffold Splits Overestimate Virtual Screening Performance0
Scalable and Customizable Benchmark Problems for Many-Objective Optimization0
Scalable and Hybrid Ensemble-Based Causality Discovery0
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency0
Scalable Psychological Momentum Forecasting in Esports0
Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT0
ScanNeRF: a Scalable Benchmark for Neural Radiance Fields0
SCBench: A Sports Commentary Benchmark for Video LLMs0
Scenarios and Approaches for Situated Natural Language Explanations0
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs0
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement0
Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers0
Scientific Machine Learning Benchmarks0
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models0
scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection0
Score-Based Generative Models for Molecule Generation0
Show:102550
← PrevPage 78 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified