SOTAVerified

Benchmarking

Papers

Showing 27012725 of 5548 papers

TitleStatusHype
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable QuestionsCode0
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels0
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension0
PersoBench: Benchmarking Personalized Response Generation in Large Language ModelsCode0
ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities0
Towards a Benchmark for Large Language Models for Business Process Management TasksCode0
Benchmarking the Fidelity and Utility of Synthetic Relational Data0
Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning0
Ward: Provable RAG Dataset Inference via LLM Watermarks0
Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices0
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models0
MANTRA: The Manifold Triangulations AssemblageCode0
Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning0
Repurposing Foundation Model for Generalizable Medical Time Series Classification0
Deep learning for action spotting in association football videos0
ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving0
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations0
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs0
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description0
A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning0
Deep Unlearn: Benchmarking Machine Unlearning0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks0
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents0
Match Stereo Videos via Bidirectional Alignment0
Show:102550
← PrevPage 109 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified