SOTAVerified

Benchmarking

Papers

Showing 44264450 of 5548 papers

TitleStatusHype
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language ModelsCode0
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture SearchCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish LanguageCode0
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking StudyCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener FilterCode0
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG RoutingCode0
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-BenchCode0
Recurrent Quantum Neural NetworksCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
Can geometric combinatorics improve RNA branching predictions?Code0
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture SearchCode0
Can a single neuron learn predictive uncertainty?Code0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and ChallengesCode0
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation FrameworkCode0
KArSL: Arabic Sign Language DatabaseCode0
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence ReasoningCode0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential DynamicsCode0
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-TuningCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language ModelsCode0
Show:102550
← PrevPage 178 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified