| DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning | Jun 15, 2023 | BenchmarkingConversational Question Answering | —Unverified | 0 |
| PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs | Jun 15, 2023 | Benchmarking | CodeCode Available | 2 |
| Re-Benchmarking Pool-Based Active Learning for Binary Classification | Jun 15, 2023 | Active LearningBenchmarking | CodeCode Available | 0 |
| MLonMCU: TinyML Benchmarking with Fast Retargeting | Jun 15, 2023 | Benchmarking | CodeCode Available | 1 |
| Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? | Jun 15, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| KoLA: Carefully Benchmarking World Knowledge of Large Language Models | Jun 15, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support | Jun 15, 2023 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| BED: Bi-Encoder-Based Detectors for Out-of-Distribution Detection | Jun 15, 2023 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 0 |
| Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion | Jun 15, 2023 | Benchmarkingcounterfactual | —Unverified | 0 |
| Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models | Jun 15, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |