| A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment | Apr 16, 2025 | Information RetrievalRAG | CodeCode Available | 0 |
| Can Github issues be solved with Tree Of Thoughts? | May 20, 2024 | Code GenerationGitHub issue resolution | CodeCode Available | 0 |
| A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia | Dec 4, 2023 | counterfactualLanguage Modeling | CodeCode Available | 0 |
| Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents | Nov 23, 2024 | Question AnsweringRAG | CodeCode Available | 0 |
| DIRAS: Efficient LLM Annotation of Document Relevance in Retrieval Augmented Generation | Jun 20, 2024 | Information RetrievalRAG | CodeCode Available | 0 |
| RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation | Mar 21, 2025 | Code GenerationNavigate | CodeCode Available | 0 |
| Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs | Jan 17, 2025 | Dialogue GenerationKnowledge Graphs | CodeCode Available | 0 |
| IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios | Sep 24, 2024 | Information RetrievalRAG | CodeCode Available | 0 |
| Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems | Mar 12, 2024 | Domain AdaptationHallucination | CodeCode Available | 0 |
| Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models | Oct 11, 2024 | Legal ReasoningRAG | CodeCode Available | 0 |