SOTAVerified

Benchmarking

Papers

Showing 30513100 of 5548 papers

TitleStatusHype
LLMeBench: A Flexible Framework for Accelerating LLMs BenchmarkingCode1
Benchmarking LLM powered Chatbots: Methods and Metrics0
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?0
XFlow: Benchmarking Flow Behaviors over GraphsCode1
Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)0
Precise Benchmarking of Explainable AI Attribution MethodsCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
RobustMQ: Benchmarking Robustness of Quantized Models0
A Survey of Spanish Clinical Language Models0
Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances0
qgym: A Gym for Training and Benchmarking RL-Based Quantum CompilationCode1
Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation0
Benchmarking Ultra-High-Definition Image Reflection RemovalCode0
Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks0
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering0
VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localizationCode1
Deep Learning and Computer Vision for Glaucoma Detection: A Review0
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial ExamplesCode1
TMPNN: High-Order Polynomial Regression Based on Taylor Map FactorizationCode0
SEED-Bench: Benchmarking Multimodal LLMs with Generative ComprehensionCode2
Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity AlignmentCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection SystemCode0
IML-ViT: Benchmarking Image Manipulation Localization by Vision TransformerCode2
Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems0
Quantitative Metrics for Benchmarking Human-Aware Robot NavigationCode0
YOLOBench: Benchmarking Efficient Object Detectors on Embedded SystemsCode0
Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy0
Foundational Models Defining a New Era in Vision: A Survey and OutlookCode2
Towards Long-Term predictions of Turbulence using Neural Operators0
When Multi-Task Learning Meets Partial Supervision: A Computer Vision ReviewCode0
UPREVE: An End-to-End Causal Discovery Benchmarking System0
Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor0
Benchmarking and Analyzing Generative Data for Visual Recognition0
Towards an AI Accountability Policy0
The Impact of Genomic Variation on Function (IGVF) Consortium0
Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPGCode2
PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular DockingCode1
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
JoinGym: An Efficient Query Optimization Environment for Reinforcement LearningCode1
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working MemoryCode1
The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models0
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language ModelsCode1
Benchmarking Potential Based Rewards for Learning Humanoid LocomotionCode2
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild0
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical EncodingCode1
Examining the Effects of Degree Distribution and Homophily in Graph Learning ModelsCode1
Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and ToolboxCode1
Approaches for benchmarking single-cell gene regulatory network inference methods0
Show:102550
← PrevPage 62 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified