| Protein Structure Tokenization: Benchmarking and New Recipe | Feb 28, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Benchmarking Cognitive Biases in Large Language Models as Evaluators | Sep 29, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 1 | 5 |
| Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset | Sep 16, 2021 | BenchmarkingKnowledge Base Population | CodeCode Available | 1 | 5 |
| Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation | Sep 21, 2023 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| Benchmarking Counterfactual Image Generation | Mar 29, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 1 | 5 |
| Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? | Jun 15, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 | 5 |
| Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs | Feb 21, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study | Dec 30, 2021 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| CBench: Towards Better Evaluation of Question Answering Over Knowledge Graphs | Apr 5, 2021 | BenchmarkingKnowledge Graphs | CodeCode Available | 1 | 5 |
| Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | May 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 1 | 5 |