Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4551–4600 of 5548 papers

Title	Date	Tasks	Status
Improvements & Evaluations on the MLCommons CloudMask Benchmark	Mar 7, 2024	Benchmarking	CodeCode Available
The current state of single-cell proteomics data analysis	Oct 3, 2022	Benchmarking	CodeCode Available
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Nov 11, 2024	16kBenchmarking	CodeCode Available
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation	Jan 27, 2021	BenchmarkingText Generation	CodeCode Available
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II	Sep 17, 2024	BenchmarkingDescriptive	CodeCode Available
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts	Dec 3, 2024	Age And Gender ClassificationAge and Gender Estimation	CodeCode Available
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages	Mar 24, 2025	Benchmarking	CodeCode Available
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I	Sep 12, 2024	BenchmarkingCPU	CodeCode Available
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts	Sep 5, 2024	Benchmarking	CodeCode Available
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads	Nov 6, 2022	BenchmarkingOpinion Mining	CodeCode Available
BLESS: Benchmarking Large Language Models on Sentence Simplification	Oct 24, 2023	BenchmarkingDiversity	CodeCode Available
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction	Oct 20, 2021	BenchmarkingLanguage Modeling	CodeCode Available
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Apr 23, 2024	BenchmarkingHyperspectral Image Classification	CodeCode Available
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts	Oct 13, 2023	BenchmarkingSentiment Analysis	CodeCode Available
LLM Performance for Code Generation on Noisy Tasks	May 29, 2025	BenchmarkingCode Generation	CodeCode Available
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge	Jun 17, 2025	BenchmarkingRetrieval	CodeCode Available
A Dataset for Web-Scale Knowledge Base Population	Jun 3, 2018	BenchmarkingKnowledge Base Population	CodeCode Available
The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation	Feb 11, 2025	BenchmarkingDe-identification	CodeCode Available
Impact of ImageNet Model Selection on Domain Adaptation	Feb 6, 2020	BenchmarkingDomain Adaptation	CodeCode Available
Immunofluorescence Capillary Imaging Segmentation: Cases Study	Jul 14, 2022	BenchmarkingImage Segmentation	CodeCode Available
Analyzing the Feature Extractor Networks for Face Image Synthesis	Jun 4, 2024	BenchmarkingImage Generation	CodeCode Available
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning	Sep 30, 2024	BenchmarkingDisparity Estimation	CodeCode Available
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method	Jan 23, 2024	BenchmarkingFairness	CodeCode Available
Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning Perspective	Oct 14, 2024	BenchmarkingContrastive Learning	CodeCode Available
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions	Dec 11, 2024	BenchmarkingQuestion Answering	CodeCode Available
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models	May 5, 2024	Benchmarking	CodeCode Available
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark Suite	May 30, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis	May 14, 2025	BenchmarkingComputational Efficiency	CodeCode Available
Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization	Aug 29, 2024	BenchmarkingDiversity	CodeCode Available
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment	Jun 1, 2023	BenchmarkingHate Speech Detection	CodeCode Available
Local manifold learning and its link to domain-based physics knowledge	Jul 1, 2022	BenchmarkingDimensionality Reduction	CodeCode Available
LOCO-EPI: Leave-one-chromosome-out (LOCO) as a benchmarking paradigm for deep learning based prediction of enhancer-promoter interactions	Apr 1, 2025	Benchmarking	CodeCode Available
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)	Oct 6, 2022	Benchmarking	CodeCode Available
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors	Mar 28, 2025	BenchmarkingCode Generation	CodeCode Available
BioSentVec: creating sentence embeddings for biomedical texts	Oct 22, 2018	ArticlesBenchmarking	CodeCode Available
LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges	May 24, 2025	BenchmarkingMathematical Reasoning	CodeCode Available
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical Systems	Apr 5, 2023	Benchmarking	CodeCode Available
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF Infeasible	Jul 10, 2025	Adversarial AttackBenchmarking	CodeCode Available
LogoNet: a fine-grained network for instance-level logo sketch retrieval	Apr 5, 2023	2kBenchmarking	CodeCode Available
Identifying Money Laundering Subgraphs on the Blockchain	Oct 10, 2024	Benchmarking	CodeCode Available
Identifying and Benchmarking Natural Out-of-Context Prediction Problems	Oct 25, 2021	Benchmarking	CodeCode Available
Analysis \| OPEN \| Published: 17 June 2019 Multitask learning and benchmarking with clinical time series data	Jun 17, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
IdeaBench: Benchmarking Large Language Models for Research Idea Generation	Oct 31, 2024	Benchmarkingscientific discovery	CodeCode Available
IceBench: A Benchmark for Deep Learning based Sea Ice Type Classification	Mar 22, 2025	BenchmarkingClassification	CodeCode Available
BioFors: A Large Biomedical Image Forensics Dataset	Aug 30, 2021	BenchmarkingImage Forensics	CodeCode Available
Benchmarking Attribution Methods with Relative Feature Importance	Jul 23, 2019	BenchmarkingFeature Importance	CodeCode Available
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs	Feb 25, 2024	BenchmarkingChatbot	CodeCode Available
Hyperspectral Image Dataset for Benchmarking on Salient Object Detection	Jun 29, 2018	BenchmarkingObject	CodeCode Available
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning	Jan 1, 2020	Benchmarkingreinforcement-learning	CodeCode Available
Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition	Sep 2, 2018	Age-Invariant Face RecognitionBenchmarking	CodeCode Available

Show:10 25 50

← PrevPage 92 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified