Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4801–4850 of 5548 papers

Title	Date	Tasks	Status
VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs	Feb 23, 2025	Benchmarking	—Unverified
SPot: A tool for identifying operating segments in financial tables	May 17, 2020	Benchmarking	—Unverified
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors	Jun 19, 2025	BenchmarkingFace Swapping	—Unverified
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark	Jun 4, 2018	BenchmarkingBIG-bench Machine Learning	—Unverified
SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads	Jul 8, 2025	Benchmarking	—Unverified
Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos	Oct 15, 2024	BenchmarkingBlind Face Restoration	—Unverified
Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation	Mar 22, 2024	BenchmarkingDeep Reinforcement Learning	—Unverified
SS3DM: Benchmarking Street-View Surface Reconstruction with a Synthetic 3D Mesh Dataset	Oct 29, 2024	3D ReconstructionAutonomous Driving	—Unverified
Analysing Features Learned Using Unsupervised Models on Program Embeddings	Jan 1, 2021	BenchmarkingBinary Classification	—Unverified
Stability Constrained OPF in Microgrids: A Chance Constrained Optimization Framework with Non-Gaussian Uncertainty	Feb 4, 2023	Benchmarking	—Unverified
Stabilized Self-training with Negative Sampling on Few-labeled Graph Data	Sep 29, 2021	BenchmarkingNode Classification	—Unverified
Analysing Errors of Open Information Extraction Systems	Jul 24, 2017	BenchmarkingOpen Information Extraction	—Unverified
An AI based talent acquisition and benchmarking for job	Aug 12, 2020	BenchmarkingCultural Vocal Bursts Intensity Prediction	—Unverified
Stable Virtual Camera: Generative View Synthesis with Diffusion Models	Mar 18, 2025	Benchmarking	—Unverified
An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model	Mar 28, 2025	Algorithmic TradingBenchmarking	—Unverified
Staining normalization in histopathology: Method benchmarking using multicenter dataset	Jun 23, 2025	Benchmarking	—Unverified
Standardisation of Convex Ultrasound Data Through Geometric Analysis and Augmentation	Feb 13, 2025	Benchmarking	—Unverified
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package	Oct 20, 2023	Benchmarking	—Unverified
CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing	Apr 14, 2020	BenchmarkingGeneral Classification	—Unverified
Word Complexity Estimation for Japanese Lexical Simplification	May 1, 2020	BenchmarkingLexical Simplification	—Unverified
A Boosting Approach to Constructing an Ensemble Stack	Nov 28, 2022	BenchmarkingEnsemble Learning	—Unverified
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground	Mar 4, 2024	Benchmarking	—Unverified
An Accelerated Correlation Filter Tracker	Dec 5, 2019	BenchmarkingObject Tracking	—Unverified
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data	Jan 16, 2025	BenchmarkingClustering	—Unverified
State and Memory is All You Need for Robust and Reliable AI Agents	Jun 30, 2025	AllBenchmarking	—Unverified
State-of-the-art AI-based Learning Approaches for Deepfake Generation and Detection, Analyzing Opportunities, Threading through Pros, Cons, and Future Prospects	Jan 2, 2025	BenchmarkingFace Swapping	—Unverified
CroCoDL: Cross-device Collaborative Dataset for Localization	Jan 1, 2025	BenchmarkingPose Estimation	—Unverified
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models	May 22, 2024	BenchmarkingHallucination	—Unverified
CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models	Feb 8, 2023	BenchmarkingFew-Shot Learning	—Unverified
Cross-functional transferability in universal machine learning interatomic potentials	Apr 7, 2025	BenchmarkingTransfer Learning	—Unverified
State-of-the-Art in Human Scanpath Prediction	Feb 24, 2021	BenchmarkingPrediction	—Unverified
Statistical Multicriteria Benchmarking via the GSD-Front	Jun 6, 2024	Benchmarking	—Unverified
Abnormality-Driven Representation Learning for Radiology Imaging	Nov 25, 2024	BenchmarkingContrastive Learning	—Unverified
crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023	Jun 13, 2025	BenchmarkingDomain Adaptation	—Unverified
CRF-based Single-stage Acoustic Modeling with CTC Topology	Apr 16, 2019	BenchmarkingSpeech Recognition	—Unverified
Cross-Model Image Annotation Platform with Active Learning	Aug 6, 2020	Active LearningBenchmarking	—Unverified
Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability	Jun 11, 2021	Benchmarking	—Unverified
Cross-replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability	Aug 1, 2021	Benchmarking	—Unverified
Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation	Aug 27, 2024	BenchmarkingDecision Making	—Unverified
Cross-Subject Deep Transfer Models for Evoked Potentials in Brain-Computer Interface	Jan 29, 2023	BenchmarkingBrain Computer Interface	—Unverified
Creating a Data Collection for Evaluating Rich Speech Retrieval	May 1, 2012	BenchmarkingRetrieval	—Unverified
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization	Sep 9, 2021	BenchmarkingSelf-Driving Cars	—Unverified
CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems	Dec 13, 2024	BenchmarkingRecommendation Systems	—Unverified
Statistical Scenario Modelling and Lookalike Distributions for Multi-Variate AI Risk	Feb 20, 2025	Benchmarking	—Unverified
Covariance Matrix Adaptation Evolution Strategy Assisted by Principal Component Analysis	May 8, 2021	BenchmarkingDimensionality Reduction	—Unverified
Coupling volume-excluding compartment-based models of diffusion at different scales: Voronoi and pseudo-compartment approaches	May 24, 2016	BenchmarkingBlocking	—Unverified
CSPO: Cross-Market Synergistic Stock Price Movement Forecasting with Pseudo-volatility Optimization	Mar 26, 2025	Benchmarking	—Unverified
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories	Feb 10, 2025	Benchmarking	—Unverified
StEduCov: An Explored and Benchmarked Dataset on Stance Detection in Tweets towards Online Education during COVID-19 Pandemic	Aug 22, 2022	BenchmarkingStance Detection	—Unverified
Steerable Pyramid Weighted Loss: Multi-Scale Adaptive Weighting for Semantic Segmentation	Mar 9, 2025	Autonomous DrivingBenchmarking	—Unverified

Show:10 25 50

← PrevPage 97 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified