| HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models | Feb 9, 2025 | Answer GenerationLanguage Modeling | CodeCode Available | 0 |
| Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning | Feb 8, 2025 | Legal ReasoningMultiple-choice | CodeCode Available | 0 |
| ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning | Feb 7, 2025 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| LLMs to Support a Domain Specific Knowledge Assistant | Feb 6, 2025 | ChatbotMultiple-choice | —Unverified | 0 |
| The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs | Feb 6, 2025 | Multiple-choiceSensitivity | —Unverified | 0 |
| Evalita-LLM: Benchmarking Large Language Models on Italian | Feb 4, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 |
| TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes | Feb 4, 2025 | Autonomous DrivingMultiple-choice | CodeCode Available | 1 |
| The Use of Artificial Intelligence Tools in Assessing Content Validity: A Comparative Study with Human Experts | Feb 3, 2025 | Multiple-choiceReading Comprehension | —Unverified | 0 |
| CoddLLM: Empowering Large Language Models for Data Analytics | Feb 1, 2025 | Multiple-choiceSynthetic Data Generation | —Unverified | 0 |
| InnerThoughts: Disentangling Representations and Predictions in Large Language Models | Jan 29, 2025 | Multiple-choicePosition | —Unverified | 0 |