| Towards Harnessing Large Language Models for Comprehension of Conversational Grounding | Jun 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards | Feb 1, 2024 | Answer SelectionLanguage Modeling | CodeCode Available | 0 |
| Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models | Jun 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Detecting Manipulated Contents Using Knowledge-Grounded Inference | Apr 29, 2025 | Claim VerificationFact Checking | CodeCode Available | 0 |
| Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors | Jun 18, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Detecting AI-Generated Texts in Cross-Domains | Oct 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers | Mar 27, 2024 | Generative Question AnsweringInformation Retrieval | CodeCode Available | 0 |
| How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation | Dec 28, 2023 | AI AgentLanguage Modelling | CodeCode Available | 0 |
| Chaining thoughts and LLMs to learn DNA structural biophysics | Mar 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | Aug 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| CellTypeAgent: Trustworthy cell type annotation with Large Language Models | May 13, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Resolving References in Visually-Grounded Dialogue via Text Generation | Sep 23, 2023 | Image RetrievalLanguage Modeling | CodeCode Available | 0 |
| DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence? | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales | Mar 19, 2024 | Hate Speech DetectionLanguage Modeling | CodeCode Available | 0 |
| How Benchmark Prediction from Fewer Data Misses the Mark | Jun 9, 2025 | Large Language ModelPrediction | CodeCode Available | 0 |
| Summarisation of German Judgments in conjunction with a Class-based Evaluation | May 9, 2025 | DecoderLanguage Modeling | CodeCode Available | 0 |
| SumRec: A Framework for Recommendation using Open-Domain Dialogue | Feb 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| HORAE: A Domain-Agnostic Language for Automated Service Regulation | Jun 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Design Principle Transfer in Neural Architecture Search via Large Language Models | Aug 21, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration | Nov 5, 2024 | Collaborative InferenceLarge Language Model | CodeCode Available | 0 |
| HLAT: High-quality Large Language Model Pre-trained on AWS Trainium | Apr 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction | Jul 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? | Oct 17, 2024 | AllLanguage Modeling | CodeCode Available | 0 |
| Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models | Aug 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective | Nov 23, 2023 | Large Language ModelMulti-Armed Bandits | CodeCode Available | 0 |