| Improvements & Evaluations on the MLCommons CloudMask Benchmark | Mar 7, 2024 | Benchmarking | CodeCode Available | 0 |
| The current state of single-cell proteomics data analysis | Oct 3, 2022 | Benchmarking | CodeCode Available | 0 |
| Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | Nov 11, 2024 | 16kBenchmarking | CodeCode Available | 0 |
| BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation | Jan 27, 2021 | BenchmarkingText Generation | CodeCode Available | 0 |
| Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II | Sep 17, 2024 | BenchmarkingDescriptive | CodeCode Available | 0 |
| BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts | Dec 3, 2024 | Age And Gender ClassificationAge and Gender Estimation | CodeCode Available | 0 |
| LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages | Mar 24, 2025 | Benchmarking | CodeCode Available | 0 |
| Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I | Sep 12, 2024 | BenchmarkingCPU | CodeCode Available | 0 |
| LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts | Sep 5, 2024 | Benchmarking | CodeCode Available | 0 |
| Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads | Nov 6, 2022 | BenchmarkingOpinion Mining | CodeCode Available | 0 |
| BLESS: Benchmarking Large Language Models on Sentence Simplification | Oct 24, 2023 | BenchmarkingDiversity | CodeCode Available | 0 |
| Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction | Oct 20, 2021 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification | Apr 23, 2024 | BenchmarkingHyperspectral Image Classification | CodeCode Available | 0 |
| BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts | Oct 13, 2023 | BenchmarkingSentiment Analysis | CodeCode Available | 0 |
| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge | Jun 17, 2025 | BenchmarkingRetrieval | CodeCode Available | 0 |
| A Dataset for Web-Scale Knowledge Base Population | Jun 3, 2018 | BenchmarkingKnowledge Base Population | CodeCode Available | 0 |
| The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation | Feb 11, 2025 | BenchmarkingDe-identification | CodeCode Available | 0 |
| Impact of ImageNet Model Selection on Domain Adaptation | Feb 6, 2020 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| Immunofluorescence Capillary Imaging Segmentation: Cases Study | Jul 14, 2022 | BenchmarkingImage Segmentation | CodeCode Available | 0 |
| Analyzing the Feature Extractor Networks for Face Image Synthesis | Jun 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 0 |
| ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning | Sep 30, 2024 | BenchmarkingDisparity Estimation | CodeCode Available | 0 |
| LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method | Jan 23, 2024 | BenchmarkingFairness | CodeCode Available | 0 |
| Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning Perspective | Oct 14, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions | Dec 11, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models | May 5, 2024 | Benchmarking | CodeCode Available | 0 |
| AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark Suite | May 30, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis | May 14, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization | Aug 29, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment | Jun 1, 2023 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| Local manifold learning and its link to domain-based physics knowledge | Jul 1, 2022 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 |
| LOCO-EPI: Leave-one-chromosome-out (LOCO) as a benchmarking paradigm for deep learning based prediction of enhancer-promoter interactions | Apr 1, 2025 | Benchmarking | CodeCode Available | 0 |
| IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C) | Oct 6, 2022 | Benchmarking | CodeCode Available | 0 |
| Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors | Mar 28, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| BioSentVec: creating sentence embeddings for biomedical texts | Oct 22, 2018 | ArticlesBenchmarking | CodeCode Available | 0 |
| LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges | May 24, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 0 |
| IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical Systems | Apr 5, 2023 | Benchmarking | CodeCode Available | 0 |
| Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF Infeasible | Jul 10, 2025 | Adversarial AttackBenchmarking | CodeCode Available | 0 |
| LogoNet: a fine-grained network for instance-level logo sketch retrieval | Apr 5, 2023 | 2kBenchmarking | CodeCode Available | 0 |
| Identifying Money Laundering Subgraphs on the Blockchain | Oct 10, 2024 | Benchmarking | CodeCode Available | 0 |
| Identifying and Benchmarking Natural Out-of-Context Prediction Problems | Oct 25, 2021 | Benchmarking | CodeCode Available | 0 |
| Analysis | OPEN | Published: 17 June 2019 Multitask learning and benchmarking with clinical time series data | Jun 17, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| IdeaBench: Benchmarking Large Language Models for Research Idea Generation | Oct 31, 2024 | Benchmarkingscientific discovery | CodeCode Available | 0 |
| IceBench: A Benchmark for Deep Learning based Sea Ice Type Classification | Mar 22, 2025 | BenchmarkingClassification | CodeCode Available | 0 |
| BioFors: A Large Biomedical Image Forensics Dataset | Aug 30, 2021 | BenchmarkingImage Forensics | CodeCode Available | 0 |
| Benchmarking Attribution Methods with Relative Feature Importance | Jul 23, 2019 | BenchmarkingFeature Importance | CodeCode Available | 0 |
| HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Feb 25, 2024 | BenchmarkingChatbot | CodeCode Available | 0 |
| Hyperspectral Image Dataset for Benchmarking on Salient Object Detection | Jun 29, 2018 | BenchmarkingObject | CodeCode Available | 0 |
| Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning | Jan 1, 2020 | Benchmarkingreinforcement-learning | CodeCode Available | 0 |
| Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition | Sep 2, 2018 | Age-Invariant Face RecognitionBenchmarking | CodeCode Available | 0 |