| Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment | Feb 21, 2024 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models | Feb 21, 2024 | Benchmarking | —Unverified | 0 |
| A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models | Feb 21, 2024 | BenchmarkingImage to text | —Unverified | 0 |
| KetGPT -- Dataset Augmentation of Quantum Circuits using Transformers | Feb 20, 2024 | Benchmarking | —Unverified | 0 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 |
| Benchmarking Retrieval-Augmented Generation for Medicine | Feb 20, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |
| CausalGym: Benchmarking causal interpretability methods on linguistic tasks | Feb 19, 2024 | BenchmarkingInterpretability Techniques for Deep Learning | CodeCode Available | 2 |
| Synthetic location trajectory generation using categorical diffusion models | Feb 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Event-Based Motion Magnification | Feb 19, 2024 | BenchmarkingMotion Detection | CodeCode Available | 2 |
| FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation | Feb 19, 2024 | BenchmarkingChatbot | —Unverified | 0 |