| Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Jan 22, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| EvoLearner: Learning Description Logics with Evolutionary Algorithms | Nov 8, 2021 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 0 | 5 |
| Graph Convolutional Networks Meet with High Dimensionality Reduction | Nov 7, 2019 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset | Feb 8, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo | Mar 14, 2019 | BenchmarkingOpenAI Gym | CodeCode Available | 0 | 5 |
| Strong and Simple Baselines for Multimodal Utterance Embeddings | May 14, 2019 | Benchmarking | CodeCode Available | 0 | 5 |
| GOAL: Towards Benchmarking Few-Shot Sports Game Summarization | Jul 18, 2022 | Benchmarking | CodeCode Available | 0 | 5 |
| Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams | Jun 17, 2024 | AllBenchmarking | CodeCode Available | 0 | 5 |
| GNNMerge: Merging of GNN Models Without Accessing Training Data | Mar 5, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 0 | 5 |
| DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models | Jun 8, 2023 | BenchmarkingFairness | CodeCode Available | 0 | 5 |
| LoopDB: A Loop Closure Dataset for Large Scale Simultaneous Localization and Mapping | Jun 7, 2025 | BenchmarkingSimultaneous Localization and Mapping | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Image Classification of Marine Mammals | Oct 22, 2024 | Benchmarkingimage-classification | CodeCode Available | 0 | 5 |
| Divergent Creativity in Humans and Large Language Models | May 13, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks | Jan 7, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 0 | 5 |
| Distributional Depth-Based Estimation of Object Articulation Models | Aug 12, 2021 | BenchmarkingObject | CodeCode Available | 0 | 5 |
| Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation | Oct 29, 2021 | BenchmarkingBrain Tumor Segmentation | CodeCode Available | 0 | 5 |
| A Framework for Generating Informative Benchmark Instances | May 29, 2022 | Benchmarking | CodeCode Available | 0 | 5 |
| Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection | Aug 22, 2023 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 0 | 5 |
| Experimental Analysis of Large-scale Learnable Vector Storage Compression | Nov 27, 2023 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box Optimization | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| AI-generated Image Quality Assessment in Visual Communication | Dec 20, 2024 | BenchmarkingImage Quality Assessment | CodeCode Available | 0 | 5 |
| Geological Inference from Textual Data using Word Embeddings | Apr 10, 2025 | BenchmarkingWord Embeddings | CodeCode Available | 0 | 5 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| AstroVision: Towards Autonomous Feature Detection and Description for Missions to Small Bodies Using Deep Learning | Aug 3, 2022 | Benchmarking | CodeCode Available | 0 | 5 |
| Machine learning classification of non-Markovian noise disturbing quantum dynamics | Jan 8, 2021 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 | 5 |
| Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data | Jan 31, 2024 | BenchmarkingChange Detection | CodeCode Available | 0 | 5 |
| A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Voice | Dec 20, 2024 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability | Feb 18, 2020 | BenchmarkingFederated Learning | CodeCode Available | 0 | 5 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 | 5 |
| Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI | Mar 7, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Aug 2, 2024 | Benchmarkingmultimodal interaction | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Molecule Prediction Tasks | Mar 8, 2024 | BenchmarkingPrediction | CodeCode Available | 0 | 5 |
| DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions | May 8, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 | 5 |
| Are Large Language Models Good at Utility Judgments? | Mar 28, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 | 5 |
| Benchmarking performance of object detection under image distortions in an uncontrolled environment | Oct 28, 2022 | BenchmarkingObject | CodeCode Available | 0 | 5 |
| DispaRisk: Auditing Fairness Through Usable Information | May 20, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 | 5 |
| A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making | Sep 9, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Exploring Context Generalizability in Citywide Crowd Mobility Prediction: An Analytic Framework and Benchmark | Jun 30, 2021 | BenchmarkingPrediction | CodeCode Available | 0 | 5 |
| Benchmarking Perturbation-based Saliency Maps for Explaining Atari Agents | Jan 18, 2021 | Atari GamesBenchmarking | CodeCode Available | 0 | 5 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 | 5 |
| GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking | May 24, 2023 | BenchmarkingGraph Mining | CodeCode Available | 0 | 5 |
| Exploring Model-based Planning with Policy Networks | Jun 20, 2019 | Benchmarkingmodel | CodeCode Available | 0 | 5 |
| GenderBench: Evaluation Suite for Gender Biases in LLMs | May 17, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms | Jul 1, 2022 | BenchmarkingClassification | CodeCode Available | 0 | 5 |
| GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations | Jun 17, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 | 5 |
| A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | Apr 15, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters | Jun 16, 2024 | BenchmarkingInstance Segmentation | CodeCode Available | 0 | 5 |
| Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma | Oct 4, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 | 5 |