SOTAVerified

Benchmarking

Papers

Showing 27762800 of 5548 papers

TitleStatusHype
Hard-Label Cryptanalytic Extraction of Neural Network ModelsCode0
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection ModelsCode0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part IICode0
WER We Stand: Benchmarking Urdu ASR Models0
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event DetectionCode0
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language ModelsCode0
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness CalibrationCode0
Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact0
Benchmarking VLMs' Reasoning About Persuasive Atypical Images0
Benchmarking Large Language Model Uncertainty for Prompt OptimizationCode0
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data0
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study0
Text-To-Speech Synthesis In The Wild0
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering0
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal0
The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine0
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraintsCode0
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots0
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG0
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification0
Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part ICode0
Benchmarking 2D Egocentric Hand Pose Datasets0
Understanding Foundation Models: Are We Back in 1924?0
Unsupervised Novelty Detection Methods Benchmarking with Wavelet DecompositionCode0
Show:102550
← PrevPage 112 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified