| SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search | Jan 7, 2022 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Oct 10, 2024 | Conformal PredictionLanguage Modeling | —Unverified | 0 | 0 |
| SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning | Apr 22, 2025 | Multiple-choicereinforcement-learning | —Unverified | 0 | 0 |
| SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia | Mar 21, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models | Feb 12, 2025 | FairnessMultiple-choice | —Unverified | 0 | 0 |
| SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark | Feb 6, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Scene Restoring for Narrative Machine Reading Comprehension | Nov 1, 2020 | Cloze TestMachine Reading Comprehension | —Unverified | 0 | 0 |
| Scheduling Algorithms for Federated Learning with Minimal Energy Consumption | Sep 13, 2022 | Federated LearningMultiple-choice | —Unverified | 0 | 0 |
| VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | Feb 19, 2025 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level | Aug 20, 2019 | General KnowledgeMultiple-choice | —Unverified | 0 | 0 |