SOTAVerified

Benchmarking

Papers

Showing 22762300 of 5548 papers

TitleStatusHype
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming0
How to Benchmark Vision Foundation Models for Semantic Segmentation?Code1
LongEmbed: Extending Embedding Models for Long Context RetrievalCode2
Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems0
Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions0
VBR: A Vision Benchmark in RomeCode2
Benchmarking changepoint detection algorithms on cardiac time series0
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs0
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset0
Iterated Invariant Extended Kalman Filter (IterIEKF)0
Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network0
Revealing data leakage in protein interaction benchmarksCode2
Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic DataCode1
LLM Evaluators Recognize and Favor Their Own Generations0
Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach0
A Universal Protocol to Benchmark Camera Calibration for Sports0
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic CountingCode0
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image SegmentationCode1
A Large-Scale Evaluation of Speech Foundation Models0
MMInA: Benchmarking Multihop Multimodal Internet Agents0
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming ProblemsCode1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsCode1
A Review and Efficient Implementation of Scene Graph Generation MetricsCode1
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptidesCode0
RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via DiffusionCode1
Show:102550
← PrevPage 92 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified