| GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks | Jul 30, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images | Jul 30, 2024 | BenchmarkingMultiple Instance Learning | —Unverified | 0 |
| Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning | Jul 29, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks | Jul 29, 2024 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 |
| On the Evaluation Consistency of Attribution-based Explanations | Jul 28, 2024 | Benchmarking | CodeCode Available | 0 |
| Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection | Jul 28, 2024 | BenchmarkingFake News Detection | —Unverified | 0 |
| Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging | Jul 26, 2024 | Benchmarking | CodeCode Available | 0 |
| Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems | Jul 26, 2024 | Benchmarking | —Unverified | 0 |
| GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy | Jul 25, 2024 | Benchmarking | —Unverified | 0 |
| SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images | Jul 25, 2024 | Benchmarking | —Unverified | 0 |
| Quality Assured: Rethinking Annotation Strategies in Imaging AI | Jul 24, 2024 | Benchmarking | —Unverified | 0 |
| Building a Domain-specific Guardrail Model in Production | Jul 24, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Can time series forecasting be automated? A benchmark and analysis | Jul 23, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models | Jul 23, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| Hi-EF: Benchmarking Emotion Forecasting in Human-interaction | Jul 23, 2024 | Benchmarking | CodeCode Available | 0 |
| BONES: a Benchmark fOr Neural Estimation of Shapley values | Jul 23, 2024 | Benchmarking | CodeCode Available | 0 |
| StylusAI: Stylistic Adaptation for Robust German Handwritten Text Generation | Jul 22, 2024 | BenchmarkingText Generation | —Unverified | 0 |
| Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA | Jul 22, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Benchmarks as Microscopes: A Call for Model Metrology | Jul 22, 2024 | Benchmarkingmodel | —Unverified | 0 |
| Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research | Jul 22, 2024 | Benchmarking | —Unverified | 0 |
| Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems | Jul 22, 2024 | BenchmarkingClustering | —Unverified | 0 |
| InLUT3D: Challenging real indoor dataset for point cloud analysis | Jul 22, 2024 | BenchmarkingScene Understanding | —Unverified | 0 |
| Open-CD: A Comprehensive Toolbox for Change Detection | Jul 22, 2024 | BenchmarkingChange Detection | —Unverified | 0 |
| Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs | Jul 20, 2024 | BenchmarkingDomain Adaptation | —Unverified | 0 |