| STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models | Aug 29, 2024 | BenchmarkingImage Generation | CodeCode Available | 1 |
| How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models | Aug 29, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 1 |
| Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization | Aug 29, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction | Aug 29, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | Aug 28, 2024 | Benchmarking | CodeCode Available | 2 |
| Benchmarking foundation models as feature extractors for weakly-supervised computational pathology | Aug 28, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models | Aug 28, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 1 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 |
| Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts | Aug 27, 2024 | BenchmarkingModel Predictive Control | —Unverified | 0 |