| RoLargeSum: A Large Dialect-Aware Romanian News Dataset for Summary, Headline, and Keyword Generation | Dec 15, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| Sequence-Level Leakage Risk of Training Data in Large Language Models | Dec 15, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation | Dec 15, 2024 | 3D GenerationBenchmarking | —Unverified | 0 |
| NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries | Dec 14, 2024 | BenchmarkingEmbodied Question Answering | —Unverified | 0 |
| NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models | Dec 14, 2024 | BenchmarkingDrug Design | CodeCode Available | 2 |
| EvalGIM: A Library for Evaluating Generative Image Models | Dec 13, 2024 | BenchmarkingDiversity | CodeCode Available | 2 |
| CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems | Dec 13, 2024 | BenchmarkingRecommendation Systems | —Unverified | 0 |
| Benchmarking Table Comprehension In The Wild | Dec 13, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Benchmarking Linguistic Diversity of Large Language Models | Dec 13, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking large language models for materials synthesis: the case of atomic layer deposition | Dec 13, 2024 | BenchmarkingHallucination | —Unverified | 0 |