| NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics | Jun 16, 2024 | Benchmarkingde novo peptide sequencing | —Unverified | 0 |
| GANmut: Generating and Modifying Facial Expressions | Jun 16, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | Jun 15, 2024 | BenchmarkingHumanEval | —Unverified | 0 |
| Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models | Jun 15, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Beyond Slow Signs in High-fidelity Model Extraction | Jun 14, 2024 | Benchmarkingmodel | CodeCode Available | 0 |
| ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures | Jun 14, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading | Jun 14, 2024 | BenchmarkingMathematical Proofs | CodeCode Available | 0 |
| On the Evaluation of Speech Foundation Models for Spoken Language Understanding | Jun 14, 2024 | BenchmarkingPrediction | —Unverified | 0 |
| Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming | Jun 14, 2024 | BenchmarkingGeneral Knowledge | —Unverified | 0 |
| Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework | Jun 14, 2024 | Benchmarking | —Unverified | 0 |