| Rethinking AI Cultural Alignment | Jan 13, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Rethinking Generative Large Language Model Evaluation for Semantic Comprehension | Mar 12, 2024 | Language Model EvaluationLanguage Modeling | —Unverified | 0 | 0 |
| Reusing Swedish FrameNet for training semantic roles | May 1, 2014 | Multiple-choice | —Unverified | 0 | 0 |
| Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions | Feb 25, 2025 | Inductive BiasLogical Reasoning | —Unverified | 0 | 0 |
| RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge | Jan 2, 2021 | counterfactualCounterfactual Reasoning | —Unverified | 0 | 0 |
| RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation | Sep 24, 2024 | Multiple-choiceSentence | —Unverified | 0 | 0 |
| R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest | Oct 27, 2024 | Medical Visual Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 | 0 |
| Robust portfolio optimization model for electronic coupon allocation | May 21, 2024 | Multiple-choicePortfolio Optimization | —Unverified | 0 | 0 |
| Visual Madlibs: Fill in the blank Image Generation and Question Answering | May 31, 2015 | Image GenerationMultiple-choice | —Unverified | 0 | 0 |
| SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation | May 14, 2025 | Autonomous DrivingAutonomous Navigation | —Unverified | 0 | 0 |
| Adversarial Training for Machine Reading Comprehension with Virtual Embeddings | Jun 8, 2021 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text | Nov 25, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Visual Question Answering as Reading Comprehension | Nov 29, 2018 | Common Sense ReasoningGeneral Knowledge | —Unverified | 0 | 0 |
| Adversarial Databases Improve Success in Retrieval-based Large Language Models | Jul 19, 2024 | Multiple-choiceRAG | —Unverified | 0 | 0 |
| SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search | Jan 7, 2022 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Oct 10, 2024 | Conformal PredictionLanguage Modeling | —Unverified | 0 | 0 |
| SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning | Apr 22, 2025 | Multiple-choicereinforcement-learning | —Unverified | 0 | 0 |
| SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia | Mar 21, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models | Feb 12, 2025 | FairnessMultiple-choice | —Unverified | 0 | 0 |
| SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark | Feb 6, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Scene Restoring for Narrative Machine Reading Comprehension | Nov 1, 2020 | Cloze TestMachine Reading Comprehension | —Unverified | 0 | 0 |
| Scheduling Algorithms for Federated Learning with Minimal Energy Consumption | Sep 13, 2022 | Federated LearningMultiple-choice | —Unverified | 0 | 0 |
| VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | Feb 19, 2025 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level | Aug 20, 2019 | General KnowledgeMultiple-choice | —Unverified | 0 | 0 |