SOTAVerified

Benchmarking

Papers

Showing 33263350 of 5548 papers

TitleStatusHype
Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs0
Benchmarking Observational Studies with Experimental Data under Right-Censoring0
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal DataCode0
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language ModelsCode0
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models0
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models0
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media PlatformsCode0
KetGPT -- Dataset Augmentation of Quantum Circuits using Transformers0
Synthetic location trajectory generation using categorical diffusion modelsCode0
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation0
AnaloBench: Benchmarking the Identification of Abstract and Long-context AnalogiesCode0
Learning Disentangled Audio Representations through Controlled Synthesis0
VATr++: Choose Your Words Wisely for Handwritten Text Generation0
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models CollapseCode0
Recommendations for Baselines and Benchmarking Approximate Gaussian Processes0
Multi-Fidelity Methods for Optimization: A Survey0
Large-scale Benchmarking of Metaphor-based Optimization Heuristics0
SAWEC: Sensing-Assisted Wireless Edge ComputingCode0
Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data0
From Variability to Stability: Advancing RecSys Benchmarking PracticesCode0
Evaluation of simulation methods for tumor subclonal reconstruction0
Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms0
Benchmarking multi-component signal processing methods in the time-frequency planeCode0
Privacy-Preserving Language Model Inference with Instance Obfuscation0
Show:102550
← PrevPage 134 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified