| SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | Dec 30, 2024 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models | Feb 25, 2025 | Continual LearningGSM8K | —Unverified | 0 | 0 |
| Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III | Jun 29, 2025 | Model SelectionMultiple-choice | —Unverified | 0 | 0 |
| Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models | Oct 18, 2024 | FairnessMultiple-choice | —Unverified | 0 | 0 |
| From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams | Jun 11, 2022 | BIG-bench Machine LearningFew-Shot Learning | —Unverified | 0 | 0 |
| A Data-Driven Study of Commonsense Knowledge using the ConceptNet Knowledge Base | Nov 28, 2020 | ClusteringGraph Representation Learning | —Unverified | 0 | 0 |
| Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models | Dec 15, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Selective Particle Attention: Visual Feature-Based Attention in Deep Reinforcement Learning | Aug 26, 2020 | Deep Reinforcement LearningMultiple-choice | —Unverified | 0 | 0 |
| Self-Evaluation Improves Selective Generation in Large Language Models | Dec 14, 2023 | Multiple-choiceTruthfulQA | —Unverified | 0 | 0 |
| Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory | May 2, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Self-supervised pre-training and contrastive representation learning for multiple-choice video QA | Sep 17, 2020 | Auxiliary LearningContrastive Learning | —Unverified | 0 | 0 |
| Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data | Feb 1, 2021 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Semi-automatic Generation of Multiple-Choice Tests from Mentions of Semantic Relations | Jul 1, 2015 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering | Jan 1, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Set-LLM: A Permutation-Invariant LLM | May 21, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation | Dec 31, 2024 | Language Model EvaluationLanguage Modeling | —Unverified | 0 | 0 |
| Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions | Apr 11, 2022 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations | Mar 10, 2025 | FormMultiple-choice | —Unverified | 0 | 0 |
| Social IQa: Commonsense Reasoning about Social Interactions | Nov 1, 2019 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Solving Visual Madlibs with Multiple Cues | Aug 11, 2016 | Activity PredictionAttribute | —Unverified | 0 | 0 |
| SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge | May 27, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers | Nov 28, 2024 | Image Captioningimage-classification | —Unverified | 0 | 0 |
| Spending Money Wisely: Online Electronic Coupon Allocation based on Real-Time User Intent Detection | Aug 23, 2020 | Intent DetectionMultiple-choice | —Unverified | 0 | 0 |
| VUDG: A Dataset for Video Understanding Domain Generalization | May 30, 2025 | Domain GeneralizationMultiple-choice | —Unverified | 0 | 0 |
| SPRITE: A Response Model For Multiple Choice Testing | Jan 12, 2015 | modelMultiple-choice | —Unverified | 0 | 0 |