Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4576–4600 of 5548 papers

Title	Date	Tasks	Status
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models	May 5, 2024	Benchmarking	CodeCode Available
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark Suite	May 30, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis	May 14, 2025	BenchmarkingComputational Efficiency	CodeCode Available
Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization	Aug 29, 2024	BenchmarkingDiversity	CodeCode Available
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment	Jun 1, 2023	BenchmarkingHate Speech Detection	CodeCode Available
Local manifold learning and its link to domain-based physics knowledge	Jul 1, 2022	BenchmarkingDimensionality Reduction	CodeCode Available
LOCO-EPI: Leave-one-chromosome-out (LOCO) as a benchmarking paradigm for deep learning based prediction of enhancer-promoter interactions	Apr 1, 2025	Benchmarking	CodeCode Available
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)	Oct 6, 2022	Benchmarking	CodeCode Available
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors	Mar 28, 2025	BenchmarkingCode Generation	CodeCode Available
BioSentVec: creating sentence embeddings for biomedical texts	Oct 22, 2018	ArticlesBenchmarking	CodeCode Available
LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges	May 24, 2025	BenchmarkingMathematical Reasoning	CodeCode Available
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical Systems	Apr 5, 2023	Benchmarking	CodeCode Available
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF Infeasible	Jul 10, 2025	Adversarial AttackBenchmarking	CodeCode Available
LogoNet: a fine-grained network for instance-level logo sketch retrieval	Apr 5, 2023	2kBenchmarking	CodeCode Available
Identifying Money Laundering Subgraphs on the Blockchain	Oct 10, 2024	Benchmarking	CodeCode Available
Identifying and Benchmarking Natural Out-of-Context Prediction Problems	Oct 25, 2021	Benchmarking	CodeCode Available
Analysis \| OPEN \| Published: 17 June 2019 Multitask learning and benchmarking with clinical time series data	Jun 17, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
IdeaBench: Benchmarking Large Language Models for Research Idea Generation	Oct 31, 2024	Benchmarkingscientific discovery	CodeCode Available
IceBench: A Benchmark for Deep Learning based Sea Ice Type Classification	Mar 22, 2025	BenchmarkingClassification	CodeCode Available
BioFors: A Large Biomedical Image Forensics Dataset	Aug 30, 2021	BenchmarkingImage Forensics	CodeCode Available
Benchmarking Attribution Methods with Relative Feature Importance	Jul 23, 2019	BenchmarkingFeature Importance	CodeCode Available
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs	Feb 25, 2024	BenchmarkingChatbot	CodeCode Available
Hyperspectral Image Dataset for Benchmarking on Salient Object Detection	Jun 29, 2018	BenchmarkingObject	CodeCode Available
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning	Jan 1, 2020	Benchmarkingreinforcement-learning	CodeCode Available
Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition	Sep 2, 2018	Age-Invariant Face RecognitionBenchmarking	CodeCode Available

Show:10 25 50

← PrevPage 184 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified