| Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | May 17, 2024 | 16kBenchmarking | CodeCode Available | 3 |
| A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text | May 16, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions | May 16, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 0 |
| An Integrated Framework for Multi-Granular Explanation of Video Summarization | May 16, 2024 | BenchmarkingPanoptic Segmentation | CodeCode Available | 0 |
| DocuMint: Docstring Generation for Python using Small Language Models | May 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models | May 15, 2024 | Benchmarking | CodeCode Available | 2 |
| SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation | May 14, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| SpeechVerse: A Large-scale Generalizable Audio Language Model | May 14, 2024 | Automatic Speech RecognitionBenchmarking | —Unverified | 0 |
| UCCIX: Irish-eXcellence Large Language Model | May 13, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Divergent Creativity in Humans and Large Language Models | May 13, 2024 | Benchmarking | CodeCode Available | 0 |