| Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception | Jan 24, 2024 | Benchmarking | CodeCode Available | 1 |
| SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval | Jan 24, 2024 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding | Jan 24, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking the Fairness of Image Upsampling Methods | Jan 24, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents | Jan 24, 2024 | Benchmarking | CodeCode Available | 3 |
| LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method | Jan 23, 2024 | BenchmarkingFairness | CodeCode Available | 0 |
| Benchmarking LLMs via Uncertainty Quantification | Jan 23, 2024 | BenchmarkingUncertainty Quantification | CodeCode Available | 3 |
| What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition | Jan 23, 2024 | Benchmarking | CodeCode Available | 0 |
| Deep Neural Network Benchmarks for Selective Classification | Jan 23, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trials | Jan 22, 2024 | BenchmarkingSynthetic Data Generation | CodeCode Available | 0 |