| Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking | Jun 23, 2024 | Benchmarking | CodeCode Available | 2 |
| HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis | Jun 23, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 3 |
| GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets | Jun 23, 2024 | Benchmarking | —Unverified | 0 |
| Position: Benchmarking is Limited in Reinforcement Learning Research | Jun 23, 2024 | BenchmarkingPosition | —Unverified | 0 |
| MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication | Jun 22, 2024 | BenchmarkingMeta-Learning | CodeCode Available | 0 |
| CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans | Jun 22, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions | Jun 22, 2024 | BenchmarkingCode Generation | CodeCode Available | 4 |
| Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph | Jun 21, 2024 | BenchmarkingText Generation | CodeCode Available | 2 |
| NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking | Jun 21, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 7 |
| FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents | Jun 21, 2024 | Benchmarking | —Unverified | 0 |