| PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs | Jun 24, 2024 | BenchmarkingMachine Unlearning | —Unverified | 0 |
| CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization | Jun 24, 2024 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets | Jun 23, 2024 | Benchmarking | —Unverified | 0 |
| Position: Benchmarking is Limited in Reinforcement Learning Research | Jun 23, 2024 | BenchmarkingPosition | —Unverified | 0 |
| CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans | Jun 22, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication | Jun 22, 2024 | BenchmarkingMeta-Learning | CodeCode Available | 0 |
| Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video | Jun 21, 2024 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease Generalization | Jun 21, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents | Jun 21, 2024 | Benchmarking | —Unverified | 0 |
| Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors | Jun 21, 2024 | Adversarial DefenseAdversarial Robustness | —Unverified | 0 |