| Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective | Jun 19, 2024 | BenchmarkingContinual Pretraining | —Unverified | 0 |
| A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations | Jun 19, 2024 | Benchmarking | CodeCode Available | 2 |
| Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models | Jun 19, 2024 | BenchmarkingOpen-Domain Question Answering | —Unverified | 0 |
| Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration | Jun 19, 2024 | BenchmarkingDistractor Generation | —Unverified | 0 |
| GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Jun 19, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 |
| BeHonest: Benchmarking Honesty in Large Language Models | Jun 19, 2024 | BenchmarkingMisinformation | CodeCode Available | 1 |
| Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN | Jun 19, 2024 | BenchmarkingIntrusion Detection | CodeCode Available | 0 |
| M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere | Jun 19, 2024 | BenchmarkingSpatio-Temporal Forecasting | CodeCode Available | 0 |
| Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications | Jun 19, 2024 | BenchmarkingMachine Reading Comprehension | —Unverified | 0 |
| Exploring and Benchmarking the Planning Capabilities of Large Language Models | Jun 18, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |