SOTAVerified

Benchmarking

Papers

Showing 30513075 of 5548 papers

TitleStatusHype
LLMeBench: A Flexible Framework for Accelerating LLMs BenchmarkingCode1
Benchmarking LLM powered Chatbots: Methods and Metrics0
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?0
XFlow: Benchmarking Flow Behaviors over GraphsCode1
Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)0
Precise Benchmarking of Explainable AI Attribution MethodsCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
RobustMQ: Benchmarking Robustness of Quantized Models0
A Survey of Spanish Clinical Language Models0
Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances0
qgym: A Gym for Training and Benchmarking RL-Based Quantum CompilationCode1
Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation0
Benchmarking Ultra-High-Definition Image Reflection RemovalCode0
Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks0
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering0
VG-SSL: Benchmarking Self-supervised Representation Learning Approaches for Visual Geo-localizationCode1
Deep Learning and Computer Vision for Glaucoma Detection: A Review0
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial ExamplesCode1
TMPNN: High-Order Polynomial Regression Based on Taylor Map FactorizationCode0
SEED-Bench: Benchmarking Multimodal LLMs with Generative ComprehensionCode2
Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity AlignmentCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection SystemCode0
IML-ViT: Benchmarking Image Manipulation Localization by Vision TransformerCode2
Show:102550
← PrevPage 123 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified