SOTAVerified

Benchmarking

Papers

Showing 26512700 of 5548 papers

TitleStatusHype
TAO-Amodal: A Benchmark for Tracking Any Object AmodallyCode1
Bio-Image Informatics Index BIII: A unique database of image analysis tools and workflows for and by the bioimaging community0
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts0
QDA^2: A principled approach to automatically annotating charge stability diagrams0
Code Ownership in Open-Source AI Software SecurityCode0
FER-C: Benchmarking Out-of-Distribution Soft Calibration for Facial Expression Recognition0
How to Train Neural Field Representations: A Comprehensive Study and BenchmarkCode1
Enabling Accelerators for Graph Computing0
A Novel Hybrid Ordinal Learning Model with Health Care Application0
ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors0
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language ModelsCode1
SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration0
Efficiently Quantifying Individual Agent Importance in Cooperative MARL0
EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset0
Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation0
Benchmarking Deep Learning Classifiers for SAR Automatic Target Recognition0
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images0
Meta-survey on outlier and anomaly detectionCode0
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level PlanningCode1
Implementing hosting capacity analysis in distribution networks: Practical considerations, advancements and future directions0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
EQ-Bench: An Emotional Intelligence Benchmark for Large Language ModelsCode2
Benchmarking Distribution Shift in Tabular Data with TableShiftCode1
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into OneCode3
Graph-based Prediction and Planning Policy Network (GP3Net) for scalable self-driving in dynamic environments using Deep Reinforcement Learning0
Forecasting Lithium-Ion Battery Longevity with Limited Data Availability: Benchmarking Different Machine Learning Algorithms0
Benchmarking of Query Strategies: Towards Future Deep Active LearningCode0
STREAMLINE: An Automated Machine Learning Pipeline for Biomedicine Applied to Examine the Utility of Photography-Based Phenotypes for OSA Prediction Across International Sleep CentersCode1
An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis0
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single ImagesCode1
Perspectives on the State and Future of Deep Learning -- 20230
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?0
Pearl: A Production-ready Reinforcement Learning AgentCode4
Benchmarking Continual Learning from Cognitive Perspectives0
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking TechniqueCode0
Liquid State Genetic Programming0
Semi-implicit Continuous Newton Method for Power Flow Analysis0
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World0
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsCode1
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy TasksCode1
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM InteractionsCode1
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation0
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning0
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets0
Evetac: An Event-based Optical Tactile Sensor for Robotic Manipulation0
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time0
Identifying patterns and recommendations of and for sustainable open data initiatives: a benchmarking-driven analysis of open government data initiatives among European countries0
Show:102550
← PrevPage 54 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified