Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4351–4400 of 5548 papers

Title	Date	Tasks	Status
Ransomware Detection Using Machine Learning in the Linux Kernel	Sep 10, 2024	Benchmarking	—Unverified
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration	Apr 9, 2025	3D Semantic SegmentationBenchmarking	—Unverified
RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks	Sep 9, 2024	BenchmarkingClick-Through Rate Prediction	—Unverified
RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis	May 24, 2022	BenchmarkingGenerative Adversarial Network	—Unverified
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency	Sep 12, 2019	BenchmarkingGeneral Classification	—Unverified
A Comparative study of Hyper-Parameter Optimization Tools	Jan 17, 2022	Bayesian OptimizationBenchmarking	—Unverified
RDBench: ML Benchmark for Relational Databases	Oct 25, 2023	Benchmarking	—Unverified
Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models	Dec 5, 2024	BenchmarkingFeature Importance	—Unverified
RD-Suite: A Benchmark for Ranking Distillation	Jun 7, 2023	Benchmarking	—Unverified
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models	Jun 17, 2024	BenchmarkingSurvey	—Unverified
RealCause: Realistic Causal Inference Benchmarking	Nov 30, 2020	BenchmarkingCausal Inference	—Unverified
A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data	Sep 29, 2021	BenchmarkingDomain Adaptation	—Unverified
A Systematic Analysis of Hybrid Linear Attention	Jul 8, 2025	BenchmarkingLanguage Modeling	—Unverified
Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection	Jul 19, 2024	BenchmarkingModel Selection	—Unverified
Realistic Hair Simulation Using Image Blending	Apr 19, 2019	BenchmarkingData Augmentation	—Unverified
Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework	Jul 29, 2020	BenchmarkingVideo Summarization	—Unverified
Unifying Few- and Zero-Shot Egocentric Action Recognition	May 27, 2020	Action RecognitionBenchmarking	—Unverified
Real Time Egocentric Object Segmentation: THU-READ Labeling and Benchmarking Results	Jun 9, 2021	BenchmarkingMixed Reality	—Unverified
Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset	Feb 24, 2020	Benchmarking	—Unverified
Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences	Nov 14, 2022	Autonomous DrivingBenchmarking	—Unverified
Real-time Webcam Heart-Rate and Variability Estimation with Clean Ground Truth for Evaluation	Dec 31, 2020	BenchmarkingHeart Rate Variability	—Unverified
One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering	Nov 29, 2024	BenchmarkingObject	—Unverified
Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms	Aug 1, 2020	BenchmarkingDeblurring	—Unverified
Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming	May 15, 2025	BenchmarkingData Augmentation	—Unverified
Rearrangement: A Challenge for Embodied AI	Nov 3, 2020	Benchmarking	—Unverified
Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models	Jun 11, 2025	BenchmarkingCode Generation	—Unverified
Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature?	Dec 24, 2024	Benchmarking	—Unverified
A Comparative Analysis on Ethical Benchmarking in Large Language Models	Oct 11, 2024	BenchmarkingDecision Making	—Unverified
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers	Nov 28, 2023	BenchmarkingInformation Retrieval	—Unverified
A Survey on Vision Autoregressive Model	Nov 13, 2024	3D GenerationBenchmarking	—Unverified
A Survey on Temporal Sentence Grounding in Videos	Sep 16, 2021	Action LocalizationBenchmarking	—Unverified
A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams	Jun 16, 2021	Active LearningBenchmarking	—Unverified
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?	Aug 8, 2023	BenchmarkingCollaborative Filtering	—Unverified
Recommendations for Baselines and Benchmarking Approximate Gaussian Processes	Feb 15, 2024	BenchmarkingGaussian Processes	—Unverified
Reconstructing antibody repertoires from error-prone immunosequencing datasets	Apr 24, 2017	Benchmarking	—Unverified
A Survey on Preserving Fairness Guarantees in Changing Environments	Nov 14, 2022	BenchmarkingDecision Making	—Unverified
A Survey on Model Compression for Large Language Models	Aug 15, 2023	BenchmarkingKnowledge Distillation	—Unverified
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers	Mar 31, 2025	BenchmarkingNeural Rendering	—Unverified
A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-19	Jan 13, 2022	BenchmarkingLesion Segmentation	—Unverified
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research	Dec 3, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
A Survey on LLM-based News Recommender Systems	Feb 13, 2025	BenchmarkingFairness	—Unverified
Unitail: Detecting, Reading, and Matching in Retail Scene	Apr 1, 2022	BenchmarkingDense Object Detection	—Unverified
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking	Sep 5, 2023	BenchmarkingKnowledge Distillation	—Unverified
A Survey of Spanish Clinical Language Models	Aug 4, 2023	BenchmarkingSurvey	—Unverified
Refer to Anything with Vision-Language Prompts	Jun 5, 2025	BenchmarkingGeneralized Referring Expression Segmentation	—Unverified
Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models	Mar 3, 2025	BenchmarkingInformation Retrieval	—Unverified
Unleashing OpenTitan's Potential: a Silicon-Ready Embedded Secure Element for Root of Trust and Cryptographic Offloading	Jun 17, 2024	Autonomous VehiclesBenchmarking	—Unverified
A Survey of Small Language Models	Oct 25, 2024	BenchmarkingModel Compression	—Unverified
Regularization of ML models for Earth systems by using longer model timesteps	Mar 23, 2025	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 88 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified