| NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics | Jun 16, 2024 | Benchmarkingde novo peptide sequencing | —Unverified | 0 |
| GANmut: Generating and Modifying Facial Expressions | Jun 16, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | Jun 15, 2024 | BenchmarkingHumanEval | —Unverified | 0 |
| Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models | Jun 15, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Beyond Slow Signs in High-fidelity Model Extraction | Jun 14, 2024 | Benchmarkingmodel | CodeCode Available | 0 |
| ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures | Jun 14, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading | Jun 14, 2024 | BenchmarkingMathematical Proofs | CodeCode Available | 0 |
| On the Evaluation of Speech Foundation Models for Spoken Language Understanding | Jun 14, 2024 | BenchmarkingPrediction | —Unverified | 0 |
| Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming | Jun 14, 2024 | BenchmarkingGeneral Knowledge | —Unverified | 0 |
| Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework | Jun 14, 2024 | Benchmarking | —Unverified | 0 |
| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam Tracking | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents | Jun 13, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| ECBD: Evidence-Centered Benchmark Design for NLP | Jun 13, 2024 | Benchmarking | CodeCode Available | 0 |
| LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living | Jun 13, 2024 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| Decoding the Diversity: A Review of the Indic AI Research Landscape | Jun 13, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| A Review of 315 Benchmark and Test Functions for Machine Learning Optimization Algorithms and Metaheuristics with Mathematical and Visual Descriptions | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents | Jun 12, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| How well it works: Benchmarking performance of GPT models on medical natural language processing tasks | Jun 12, 2024 | Benchmarking | —Unverified | 0 |
| It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives | Jun 12, 2024 | AllBenchmarking | —Unverified | 0 |
| Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial Observations | Jun 12, 2024 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets | Jun 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases | Jun 12, 2024 | BenchmarkingModel Compression | —Unverified | 0 |
| A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection | Jun 11, 2024 | BenchmarkingDefect Detection | —Unverified | 0 |