| Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials | Jun 15, 2023 | BenchmarkingComputational chemistry | CodeCode Available | 1 |
| PaReprop: Fast Parallelized Reversible Backpropagation | Jun 15, 2023 | Benchmarking | CodeCode Available | 1 |
| KoLA: Carefully Benchmarking World Knowledge of Large Language Models | Jun 15, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models | Jun 15, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| AQuA: A Benchmarking Tool for Label Quality Assessment | Jun 15, 2023 | BenchmarkingLabel Error Detection | CodeCode Available | 1 |
| NeuroGraph: Benchmarks for Graph Machine Learning in Brain Connectomics | Jun 9, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 |
| Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML | Jun 8, 2023 | BenchmarkingKidney Function | CodeCode Available | 1 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |
| RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems | Jun 5, 2023 | BenchmarkingC++ code | CodeCode Available | 1 |
| Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset | Jun 5, 2023 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling | Jun 5, 2023 | BenchmarkingDenoising | CodeCode Available | 1 |
| TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain | Jun 3, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning | Jun 2, 2023 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models | Jun 2, 2023 | BenchmarkingLanguage Acquisition | CodeCode Available | 1 |
| Multilingual Conceptual Coverage in Text-to-Image Models | Jun 2, 2023 | Benchmarking | CodeCode Available | 1 |
| Improving and Benchmarking Offline Reinforcement Learning Algorithms | Jun 1, 2023 | AttributeBenchmarking | CodeCode Available | 1 |
| End-to-end Knowledge Retrieval with Multi-modal Queries | Jun 1, 2023 | BenchmarkingCross-Modal Retrieval | CodeCode Available | 1 |
| Accurate and Efficient Structural Ensemble Generation of Macrocyclic Peptides using Internal Coordinate Diffusion | May 30, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in Nanophotonics | May 30, 2023 | Benchmarking | CodeCode Available | 1 |
| SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models | May 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Decoding the Underlying Meaning of Multimodal Hateful Memes | May 28, 2023 | BenchmarkingHateful Meme Classification | CodeCode Available | 1 |
| Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks | May 26, 2023 | Benchmarking | CodeCode Available | 1 |
| KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration | May 25, 2023 | BenchmarkingFace Recognition | CodeCode Available | 1 |
| ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment | May 23, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| Exploring Large Language Models for Classical Philology | May 23, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |