| Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica | Sep 6, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters | Jul 5, 2024 | Benchmarkingvalid | CodeCode Available | 1 | 5 |
| A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis | Aug 12, 2021 | BenchmarkingMedical Image Analysis | CodeCode Available | 1 | 5 |
| BeHonest: Benchmarking Honesty in Large Language Models | Jun 19, 2024 | BenchmarkingMisinformation | CodeCode Available | 1 | 5 |
| Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones | Nov 5, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations | Apr 15, 2024 | BenchmarkingBias Detection | CodeCode Available | 1 | 5 |
| A Comprehensive Overview of Large Language Models | Jul 12, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity | Aug 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| Bench4KE: Benchmarking Automated Competency Question Generation | May 30, 2025 | BenchmarkingQuestion Generation | CodeCode Available | 1 | 5 |
| Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle? | Sep 29, 2023 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 | 5 |
| A multi-schematic classifier-independent oversampling approach for imbalanced datasets | Jul 15, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs | Sep 18, 2021 | BenchmarkingComplex Query Answering | CodeCode Available | 1 | 5 |
| AirSim Drone Racing Lab | Mar 12, 2020 | BenchmarkingOptical Flow Estimation | CodeCode Available | 1 | 5 |
| Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization | May 27, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| A SWAT-based Reinforcement Learning Framework for Crop Management | Feb 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models | Dec 5, 2023 | BenchmarkingVisual Question Answering | CodeCode Available | 1 | 5 |
| Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets | Mar 6, 2020 | BenchmarkingImage Reconstruction | CodeCode Available | 1 | 5 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning | Dec 11, 2024 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions | Oct 13, 2021 | BenchmarkingComputational Efficiency | CodeCode Available | 1 | 5 |
| Disentangled Feature Representation for Few-shot Image Classification | Sep 26, 2021 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis | Mar 9, 2021 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| Does your model understand genes? A benchmark of gene properties for biological and text models | Dec 5, 2024 | BenchmarkingMulti-class Classification | CodeCode Available | 1 | 5 |
| Event-Free Moving Object Segmentation from Moving Ego Vehicle | Apr 28, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry | Apr 1, 2023 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 1 | 5 |