| Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Oct 9, 2024 | BenchmarkingDecision Making | CodeCode Available | 3 |
| M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes | Oct 9, 2024 | BenchmarkingMotion Generation | —Unverified | 0 |
| Active Evaluation Acquisition for Efficient LLM Benchmarking | Oct 8, 2024 | Benchmarking | —Unverified | 0 |
| Manual Verbalizer Enrichment for Few-Shot Text Classification | Oct 8, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective | Oct 8, 2024 | AttributeBenchmarking | CodeCode Available | 1 |
| QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers | Oct 8, 2024 | Benchmarking | CodeCode Available | 0 |
| FedGraph: A Research Library and Benchmark for Federated Graph Learning | Oct 8, 2024 | BenchmarkingFederated Learning | CodeCode Available | 2 |
| Benchmarking of a new data splitting method on volcanic eruption data | Oct 8, 2024 | Benchmarking | —Unverified | 0 |
| Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems | Oct 7, 2024 | BenchmarkingMachine Translation | —Unverified | 0 |
| Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | Oct 7, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 |