| SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | Dec 30, 2024 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models | Feb 25, 2025 | Continual LearningGSM8K | —Unverified | 0 | 0 |
| Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III | Jun 29, 2025 | Model SelectionMultiple-choice | —Unverified | 0 | 0 |
| Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models | Oct 18, 2024 | FairnessMultiple-choice | —Unverified | 0 | 0 |
| From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams | Jun 11, 2022 | BIG-bench Machine LearningFew-Shot Learning | —Unverified | 0 | 0 |
| A Data-Driven Study of Commonsense Knowledge using the ConceptNet Knowledge Base | Nov 28, 2020 | ClusteringGraph Representation Learning | —Unverified | 0 | 0 |
| Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models | Dec 15, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Selective Particle Attention: Visual Feature-Based Attention in Deep Reinforcement Learning | Aug 26, 2020 | Deep Reinforcement LearningMultiple-choice | —Unverified | 0 | 0 |
| Self-Evaluation Improves Selective Generation in Large Language Models | Dec 14, 2023 | Multiple-choiceTruthfulQA | —Unverified | 0 | 0 |
| Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory | May 2, 2025 | Multiple-choice | —Unverified | 0 | 0 |