| LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning | Jun 16, 2023 | Active LearningBenchmarking | CodeCode Available | 1 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| MLonMCU: TinyML Benchmarking with Fast Retargeting | Jun 15, 2023 | Benchmarking | CodeCode Available | 1 |
| Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials | Jun 15, 2023 | BenchmarkingComputational chemistry | CodeCode Available | 1 |
| FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods | Jun 15, 2023 | BenchmarkingFairness | CodeCode Available | 1 |
| PaReprop: Fast Parallelized Reversible Backpropagation | Jun 15, 2023 | Benchmarking | CodeCode Available | 1 |
| Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models | Jun 15, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? | Jun 15, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| KoLA: Carefully Benchmarking World Knowledge of Large Language Models | Jun 15, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| AQuA: A Benchmarking Tool for Label Quality Assessment | Jun 15, 2023 | BenchmarkingLabel Error Detection | CodeCode Available | 1 |