Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2776–2800 of 5548 papers

Title	Date	Tasks	Status
Hard-Label Cryptanalytic Extraction of Neural Network Models	Sep 18, 2024	Benchmarking	CodeCode Available
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models	Sep 18, 2024	BenchmarkingModel Selection	CodeCode Available
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II	Sep 17, 2024	BenchmarkingDescriptive	CodeCode Available
WER We Stand: Benchmarking Urdu ASR Models	Sep 17, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection	Sep 17, 2024	BenchmarkingEvent Detection	CodeCode Available
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Sep 17, 2024	BenchmarkingBinary Classification	CodeCode Available
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration	Sep 17, 2024	Benchmarkingcounterfactual	CodeCode Available
Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact	Sep 17, 2024	BenchmarkingQuantum Machine Learning	—Unverified
Benchmarking VLMs' Reasoning About Persuasive Atypical Images	Sep 16, 2024	BenchmarkingObject Recognition	—Unverified
Benchmarking Large Language Model Uncertainty for Prompt Optimization	Sep 16, 2024	BenchmarkingDiversity	CodeCode Available
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data	Sep 15, 2024	Benchmarkingtext annotation	—Unverified
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study	Sep 13, 2024	BenchmarkingGrapheme-to-Phoneme Conversion	—Unverified
Text-To-Speech Synthesis In The Wild	Sep 13, 2024	BenchmarkingSpeaker Recognition	—Unverified
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering	Sep 13, 2024	BenchmarkingBinary Classification	—Unverified
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal	Sep 12, 2024	BenchmarkingLanguage Modeling	—Unverified
The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine	Sep 12, 2024	Autonomous DrivingBenchmarking	—Unverified
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints	Sep 12, 2024	Benchmarking	CodeCode Available
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots	Sep 12, 2024	BenchmarkingChatbot	—Unverified
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG	Sep 12, 2024	BenchmarkingQuestion Answering	—Unverified
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification	Sep 12, 2024	BenchmarkingClassification	—Unverified
Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning	Sep 12, 2024	BenchmarkingFairness	—Unverified
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I	Sep 12, 2024	BenchmarkingCPU	CodeCode Available
Benchmarking 2D Egocentric Hand Pose Datasets	Sep 11, 2024	Activity RecognitionBenchmarking	—Unverified
Understanding Foundation Models: Are We Back in 1924?	Sep 11, 2024	Benchmarking	—Unverified
Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition	Sep 11, 2024	BenchmarkingNovelty Detection	CodeCode Available

Show:10 25 50

← PrevPage 112 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified