| Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Jan 22, 2025 | Benchmarking | CodeCode Available | 0 |
| Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks | Jan 5, 2025 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| Benchmarking LLM-based Relevance Judgment Methods | Apr 17, 2025 | BenchmarkingOpen-Domain Question Answering | CodeCode Available | 0 |
| Toward 3D Object Reconstruction from Stereo Images | Oct 18, 2019 | 3D Object ReconstructionBenchmarking | CodeCode Available | 0 |
| DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models | Jun 8, 2023 | BenchmarkingFairness | CodeCode Available | 0 |
| Skelite: Compact Neural Networks for Efficient Iterative Skeletonization | Mar 10, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| Divergent Creativity in Humans and Large Language Models | May 13, 2024 | Benchmarking | CodeCode Available | 0 |
| A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series | Jun 4, 2025 | BenchmarkingIrregular Time Series | CodeCode Available | 0 |
| A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric | Jan 22, 2021 | BenchmarkingSentence | CodeCode Available | 0 |
| Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems | Oct 8, 2023 | Benchmarking | CodeCode Available | 0 |
| User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks | Aug 9, 2018 | BenchmarkingColorization | CodeCode Available | 0 |
| Towards a Benchmark for Large Language Models for Business Process Management Tasks | Oct 4, 2024 | BenchmarkingManagement | CodeCode Available | 0 |
| Weighting-Based Treatment Effect Estimation via Distribution Learning | Dec 26, 2020 | Benchmarking | CodeCode Available | 0 |
| Slot Filling for Extracting Reskilling and Upskilling Options from the Web | Jul 11, 2022 | BenchmarkingEntity Linking | CodeCode Available | 0 |
| On Pitfalls of RemOve-And-Retrain: Data Processing Inequality Perspective | Apr 26, 2023 | BenchmarkingFeature Importance | CodeCode Available | 0 |
| Distributional Depth-Based Estimation of Object Articulation Models | Aug 12, 2021 | BenchmarkingObject | CodeCode Available | 0 |
| Benchmarking Linguistic Diversity of Large Language Models | Dec 13, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| On Recurrent Neural Networks for Sequence-based Processing in Communications | May 24, 2019 | BenchmarkingDecoder | CodeCode Available | 0 |
| Benchmarking Learning Efficiency in Deep Reservoir Computing | Sep 29, 2022 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation | Apr 21, 2025 | Benchmarking | CodeCode Available | 0 |
| Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections | Nov 16, 2024 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| Benchmarking Large Language Model Uncertainty for Prompt Optimization | Sep 16, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets | May 23, 2022 | Argument MiningBenchmarking | CodeCode Available | 0 |
| On the Evaluation Consistency of Attribution-based Explanations | Jul 28, 2024 | Benchmarking | CodeCode Available | 0 |
| On the Evaluation of Conditional GANs | Jul 11, 2019 | BenchmarkingDiversity | CodeCode Available | 0 |