| A practical generalization metric for deep networks benchmarking | Sep 2, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages | Sep 1, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| Accelerating the discovery of steady-states of planetary interior dynamics with machine learning | Aug 30, 2024 | Benchmarking | —Unverified | 0 |
| SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Aug 30, 2024 | BenchmarkingSentiment Analysis | CodeCode Available | 0 |
| Understanding the User: An Intent-Based Ranking Dataset | Aug 30, 2024 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction | Aug 29, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization | Aug 29, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking foundation models as feature extractors for weakly-supervised computational pathology | Aug 28, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities | Aug 27, 2024 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |