Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3951–3975 of 5548 papers

Title	Date	Tasks	Status
Simulation of Large Scale Neural Networks for Evaluation Applications	May 20, 2018	Benchmarking	—Unverified
SinaTools: Open Source Toolkit for Arabic Natural Language Processing	Nov 3, 2024	BenchmarkingLemmatization	—Unverified
SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study	Mar 1, 2024	Benchmarking	—Unverified
Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data	Dec 3, 2024	Benchmarking	—Unverified
Single Stage Prediction with Embedded Topic Modeling of Online Reviews for Mobile App Management	Feb 19, 2018	BenchmarkingManagement	—Unverified
Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites	Mar 18, 2020	BenchmarkingDrug Discovery	—Unverified
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models	Jan 1, 2025	Benchmarking	—Unverified
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation	Jan 27, 2025	BenchmarkingC++ code	—Unverified
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping	Oct 21, 2024	Benchmarking	—Unverified
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra	Sep 22, 2024	Benchmarking	—Unverified
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback	Jan 1, 2025	Benchmarking	—Unverified
Skills and Liquidity Barriers to Youth Employment: Medium-term Evidence from a Cash Benchmarking Experiment in Rwanda	Sep 18, 2022	Benchmarking	—Unverified
SkyRover: A Modular Simulator for Cross-Domain Pathfinding	Feb 13, 2025	Benchmarking	—Unverified
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation	May 20, 2025	BenchmarkingSentence	—Unverified
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images	Jul 25, 2024	Benchmarking	—Unverified
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI	May 17, 2023	Benchmarking	—Unverified
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge	May 17, 2024	BenchmarkingSocial Media Popularity Prediction	—Unverified
SMPLy Benchmarking 3D Human Pose Estimation in the Wild	Dec 4, 2020	3D Human Pose EstimationBenchmarking	—Unverified
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos	Apr 14, 2022	BenchmarkingMultiple Object Tracking	—Unverified
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents	Jul 2, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
Social Bias Probing: Fairness Benchmarking for Language Models	Nov 15, 2023	BenchmarkingFairness	—Unverified
Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities	Oct 24, 2013	Benchmarking	—Unverified
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns	May 29, 2025	Benchmarking	—Unverified
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection	May 24, 2025	BenchmarkingImage Forgery Detection	—Unverified
Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal	Aug 7, 2024	BenchmarkingHard Attention	—Unverified

Show:10 25 50

← PrevPage 159 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified