| Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering | Oct 20, 2024 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| Integrating Diverse Knowledge Sources for Online One-shot Learning of Novel Tasks | Aug 19, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest Radiographs | Mar 22, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 | 0 |
| Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness | Apr 7, 2025 | Knowledge GraphsLanguage Modeling | —Unverified | 0 | 0 |
| Evaluating Large Language Model Capabilities in Assessing Spatial Econometrics Research | Jun 4, 2025 | counterfactualEconometrics | —Unverified | 0 | 0 |
| Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation | Nov 8, 2024 | Fact CheckingLanguage Modeling | —Unverified | 0 | 0 |
| Evaluating Large Language Model Creativity from a Literary Perspective | Nov 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating LLaMA 3.2 for Software Vulnerability Detection | Mar 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey | Mar 28, 2025 | Large Language Model | —Unverified | 0 | 0 |
| Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | Jul 7, 2025 | Large Language ModelRAG | —Unverified | 0 | 0 |
| Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jul 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models | Oct 17, 2024 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| Evaluating Steering Techniques using Human Similarity Judgments | May 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator | May 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning | Nov 7, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating the Effect of Retrieval Augmentation on Social Biases | Feb 24, 2025 | Large Language ModelQuestion Answering | —Unverified | 0 | 0 |
| Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling | May 29, 2025 | Computational EfficiencyFairness | —Unverified | 0 | 0 |
| Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology | Jan 27, 2025 | Large Language Model | —Unverified | 0 | 0 |
| Measuring the Quality of Answers in Political Q&As with Large Language Models | Apr 12, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks | Jul 10, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluation of AI Chatbots for Patient-Specific EHR Questions | Jun 5, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers | Jun 7, 2023 | Document ClassificationLanguage Modeling | —Unverified | 0 | 0 |
| Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark | May 17, 2024 | Document ClassificationLanguage Modeling | —Unverified | 0 | 0 |
| Evaluation of OpenAI o1: Opportunities and Challenges of AGI | Sep 27, 2024 | Emotion RecognitionLarge Language Model | —Unverified | 0 | 0 |
| Evaluation of the Automated Labeling Method for Taxonomic Nomenclature Through Prompt-Optimized Large Language Model | Mar 8, 2025 | Few-Shot LearningLanguage Modeling | —Unverified | 0 | 0 |