Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1651–1700 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking VLMs' Reasoning About Persuasive Atypical Images	Sep 16, 2024	BenchmarkingObject Recognition	—Unverified	0
Benchmarking Large Language Model Uncertainty for Prompt Optimization	Sep 16, 2024	BenchmarkingDiversity	CodeCode Available	0
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data	Sep 15, 2024	Benchmarkingtext annotation	—Unverified	0
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering	Sep 13, 2024	BenchmarkingBinary Classification	—Unverified	0
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study	Sep 13, 2024	BenchmarkingGrapheme-to-Phoneme Conversion	—Unverified	0
Text-To-Speech Synthesis In The Wild	Sep 13, 2024	BenchmarkingSpeaker Recognition	—Unverified	0
ODAQ: Open Dataset of Audio Quality - Benchmark on GitHub	Sep 13, 2024	Audio Quality AssessmentBenchmarking	CodeCode Available	1
Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning	Sep 12, 2024	BenchmarkingFairness	—Unverified	0
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints	Sep 12, 2024	Benchmarking	CodeCode Available	0
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification	Sep 12, 2024	BenchmarkingClassification	—Unverified	0
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal	Sep 12, 2024	BenchmarkingLanguage Modeling	—Unverified	0
The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine	Sep 12, 2024	Autonomous DrivingBenchmarking	—Unverified	0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I	Sep 12, 2024	BenchmarkingCPU	CodeCode Available	0
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG	Sep 12, 2024	BenchmarkingQuestion Answering	—Unverified	0
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots	Sep 12, 2024	BenchmarkingChatbot	—Unverified	0
Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers	Sep 11, 2024	Benchmarking	—Unverified	0
Understanding Foundation Models: Are We Back in 1924?	Sep 11, 2024	Benchmarking	—Unverified	0
Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition	Sep 11, 2024	BenchmarkingNovelty Detection	CodeCode Available	0
Benchmarking 2D Egocentric Hand Pose Datasets	Sep 11, 2024	Activity RecognitionBenchmarking	—Unverified	0
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Sep 10, 2024	BenchmarkingLanguage Modeling	CodeCode Available	0
Ransomware Detection Using Machine Learning in the Linux Kernel	Sep 10, 2024	Benchmarking	—Unverified	0
Benchmarking Sub-Genre Classification For Mainstage Dance Music	Sep 10, 2024	BenchmarkingClassification	—Unverified	0
Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations	Sep 10, 2024	BenchmarkingPoint Cloud Registration	CodeCode Available	0
VoiceWukong: Benchmarking Deepfake Voice Detection	Sep 10, 2024	BenchmarkingFace Swapping	—Unverified	0
Selecting Differential Splicing Methods: Practical Considerations	Sep 9, 2024	Benchmarking	—Unverified	0
RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks	Sep 9, 2024	BenchmarkingClick-Through Rate Prediction	—Unverified	0
NeIn: Telling What You Don't Want	Sep 9, 2024	BenchmarkingNegation	—Unverified	0
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5	Sep 9, 2024	BenchmarkingInformation Retrieval	—Unverified	0
Assessing SPARQL capabilities of Large Language Models	Sep 9, 2024	BenchmarkingKnowledge Graphs	CodeCode Available	2
DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection	Sep 9, 2024	Abuse DetectionAbusive Language	—Unverified	0
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs	Sep 9, 2024	Benchmarkingknowledge editing	—Unverified	0
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making	Sep 9, 2024	BenchmarkingDecision Making	CodeCode Available	0
Insights from Benchmarking Frontier Language Models on Web App Code Generation	Sep 8, 2024	BenchmarkingCode Generation	CodeCode Available	1
Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm	Sep 6, 2024	Benchmarkingregression	—Unverified	0
Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms	Sep 6, 2024	Bayesian InferenceBenchmarking	—Unverified	0
PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation	Sep 6, 2024	Benchmarkingimage-classification	CodeCode Available	2
Quantum Kernel Methods under Scrutiny: A Benchmarking Study	Sep 6, 2024	BenchmarkingQuantum Machine Learning	—Unverified	0
Question-Answering Dense Video Events	Sep 6, 2024	BenchmarkingQuestion Answering	CodeCode Available	0
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression	Sep 5, 2024	BenchmarkingComputational Efficiency	—Unverified	0
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift	Sep 5, 2024	Autonomous DrivingBenchmarking	—Unverified	0
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts	Sep 5, 2024	Benchmarking	CodeCode Available	0
InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure Management	Sep 5, 2024	BenchmarkingComputational Efficiency	—Unverified	0
RTLRewriter: Methodologies for Large Models aided RTL Code Optimization	Sep 4, 2024	Benchmarking	CodeCode Available	1
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Sep 4, 2024	Benchmarking	—Unverified	0
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks	Sep 4, 2024	Anomaly DetectionBenchmarking	—Unverified	0
Benchmarking Spurious Bias in Few-Shot Image Classifiers	Sep 4, 2024	AttributeBenchmarking	CodeCode Available	0
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available	0
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs	Sep 3, 2024	16kBenchmarking	CodeCode Available	1
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	Sep 3, 2024	BenchmarkingMixed Reality	—Unverified	0
Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture	Sep 3, 2024	BenchmarkingRAG	—Unverified	0

Show:10 25 50

← PrevPage 34 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified