SOTAVerified

Benchmarking

Papers

Showing 21912200 of 5548 papers

TitleStatusHype
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text0
Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail PromotionsCode0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
DocuMint: Docstring Generation for Python using Small Language ModelsCode1
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language ModelsCode2
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure InterpretationCode1
SpeechVerse: A Large-scale Generalizable Audio Language Model0
UCCIX: Irish-eXcellence Large Language Model0
Divergent Creativity in Humans and Large Language ModelsCode0
Show:102550
← PrevPage 220 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified