Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2801–2850 of 5548 papers

Title	Date	Tasks	Status
Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers	Sep 11, 2024	Benchmarking	—Unverified
Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations	Sep 10, 2024	BenchmarkingPoint Cloud Registration	CodeCode Available
VoiceWukong: Benchmarking Deepfake Voice Detection	Sep 10, 2024	BenchmarkingFace Swapping	—Unverified
Benchmarking Sub-Genre Classification For Mainstage Dance Music	Sep 10, 2024	BenchmarkingClassification	—Unverified
Ransomware Detection Using Machine Learning in the Linux Kernel	Sep 10, 2024	Benchmarking	—Unverified
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Sep 10, 2024	BenchmarkingLanguage Modeling	CodeCode Available
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs	Sep 9, 2024	Benchmarkingknowledge editing	—Unverified
Selecting Differential Splicing Methods: Practical Considerations	Sep 9, 2024	Benchmarking	—Unverified
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5	Sep 9, 2024	BenchmarkingInformation Retrieval	—Unverified
RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks	Sep 9, 2024	BenchmarkingClick-Through Rate Prediction	—Unverified
NeIn: Telling What You Don't Want	Sep 9, 2024	BenchmarkingNegation	—Unverified
DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection	Sep 9, 2024	Abuse DetectionAbusive Language	—Unverified
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making	Sep 9, 2024	BenchmarkingDecision Making	CodeCode Available
Quantum Kernel Methods under Scrutiny: A Benchmarking Study	Sep 6, 2024	BenchmarkingQuantum Machine Learning	—Unverified
Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms	Sep 6, 2024	Bayesian InferenceBenchmarking	—Unverified
Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm	Sep 6, 2024	Benchmarkingregression	—Unverified
Question-Answering Dense Video Events	Sep 6, 2024	BenchmarkingQuestion Answering	CodeCode Available
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression	Sep 5, 2024	BenchmarkingComputational Efficiency	—Unverified
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts	Sep 5, 2024	Benchmarking	CodeCode Available
InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure Management	Sep 5, 2024	BenchmarkingComputational Efficiency	—Unverified
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift	Sep 5, 2024	Autonomous DrivingBenchmarking	—Unverified
Benchmarking Spurious Bias in Few-Shot Image Classifiers	Sep 4, 2024	AttributeBenchmarking	CodeCode Available
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Sep 4, 2024	Benchmarking	—Unverified
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks	Sep 4, 2024	Anomaly DetectionBenchmarking	—Unverified
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	Sep 3, 2024	BenchmarkingMixed Reality	—Unverified
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available
Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture	Sep 3, 2024	BenchmarkingRAG	—Unverified
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents	Sep 3, 2024	Benchmarking	—Unverified
Revisiting Safe Exploration in Safe Reinforcement learning	Sep 2, 2024	Benchmarkingreinforcement-learning	—Unverified
Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification	Sep 2, 2024	Benchmarking	—Unverified
A practical generalization metric for deep networks benchmarking	Sep 2, 2024	BenchmarkingDiversity	—Unverified
Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages	Sep 1, 2024	BenchmarkingCode Generation	—Unverified
Accelerating the discovery of steady-states of planetary interior dynamics with machine learning	Aug 30, 2024	Benchmarking	—Unverified
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists	Aug 30, 2024	BenchmarkingSentiment Analysis	CodeCode Available
Understanding the User: An Intent-Based Ranking Dataset	Aug 30, 2024	BenchmarkingInformation Retrieval	—Unverified
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction	Aug 29, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization	Aug 29, 2024	BenchmarkingDiversity	CodeCode Available
Benchmarking foundation models as feature extractors for weakly-supervised computational pathology	Aug 28, 2024	BenchmarkingDiversity	—Unverified
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games	Aug 28, 2024	Atari GamesBenchmarking	—Unverified
VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities	Aug 27, 2024	BenchmarkingKnowledge Graphs	CodeCode Available
Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts	Aug 27, 2024	BenchmarkingModel Predictive Control	—Unverified
Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation	Aug 27, 2024	BenchmarkingDecision Making	—Unverified
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis	Aug 27, 2024	BenchmarkingLarge Language Model	—Unverified
Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper	Aug 27, 2024	BenchmarkingReinforcement Learning (RL)	—Unverified
BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization	Aug 27, 2024	3D Object DetectionBenchmarking	—Unverified
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting	Aug 27, 2024	BenchmarkingDecoder	CodeCode Available
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences	Aug 26, 2024	Benchmarking	—Unverified
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study	Aug 26, 2024	8kBenchmarking	—Unverified
Comparative Analysis: Violence Recognition from Videos using Transfer Learning	Aug 26, 2024	Action RecognitionBenchmarking	CodeCode Available
DHP Benchmark: Are LLMs Good NLG Evaluators?	Aug 25, 2024	Benchmarkingnlg evaluation	—Unverified

Show:10 25 50

← PrevPage 57 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified