| AlignBench: Benchmarking Chinese Alignment of Large Language Models | Nov 30, 2023 | Benchmarking | CodeCode Available | 2 |
| TaskBench: Benchmarking Large Language Models for Task Automation | Nov 30, 2023 | BenchmarkingParameter Prediction | CodeCode Available | 6 |
| ROBBIE: Robust Bias Evaluation of Large Generative Language Models | Nov 29, 2023 | BenchmarkingFairness | —Unverified | 0 |
| TransOpt: Transformer-based Representation Learning for Optimization Problem Classification | Nov 29, 2023 | BenchmarkingClassification | —Unverified | 0 |
| Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices | Nov 29, 2023 | BenchmarkingFederated Learning | —Unverified | 0 |
| Biomedical knowledge graph-optimized prompt generation for large language models | Nov 29, 2023 | BenchmarkingKnowledge Graphs | CodeCode Available | 2 |
| SAIBench: A Structural Interpretation of AI for Science Through Benchmarks | Nov 29, 2023 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification | Nov 29, 2023 | BenchmarkingDecision Making | —Unverified | 0 |
| Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs | Nov 29, 2023 | Benchmarking | CodeCode Available | 1 |
| SEED-Bench-2: Benchmarking Multimodal Large Language Models | Nov 28, 2023 | BenchmarkingImage Generation | CodeCode Available | 2 |