| Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities | Jan 22, 2025 | BenchmarkingReferring Expression | —Unverified | 0 | 0 |
| Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms | Aug 30, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels | Oct 5, 2024 | Benchmarking | —Unverified | 0 | 0 |
| The Moral Mind(s) of Large Language Models | Nov 19, 2024 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices | Mar 21, 2022 | BenchmarkingGPU | —Unverified | 0 | 0 |
| Ward: Provable RAG Dataset Inference via LLM Watermarks | Oct 4, 2024 | BenchmarkingRAG | —Unverified | 0 | 0 |
| The Multi-speaker Multi-style Voice Cloning Challenge 2021 | Apr 5, 2021 | BenchmarkingVoice Cloning | —Unverified | 0 | 0 |
| PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection | Nov 28, 2023 | Benchmarkingimage-classification | —Unverified | 0 | 0 |
| Improved statistical benchmarking of digital pathology models using pairwise frames evaluation | Jun 7, 2023 | BenchmarkingClassification | —Unverified | 0 | 0 |
| The Neural Painter: Multi-Turn Image Generation | Jun 16, 2018 | BenchmarkingConditional Image Generation | —Unverified | 0 | 0 |