| StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Nov 11, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability | Nov 10, 2024 | Multiple-choiceText Generation | —Unverified | 0 |
| Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators | Nov 8, 2024 | Decision MakingMultiple-choice | —Unverified | 0 |
| Quantitative Assessment of Intersectional Empathetic Bias and Understanding | Nov 8, 2024 | Multiple-choice | CodeCode Available | 0 |
| ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Nov 7, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| HourVideo: 1-Hour Video-Language Understanding | Nov 7, 2024 | Benchmarkingcounterfactual | CodeCode Available | 2 |
| MEG: Medical Knowledge-Augmented Large Language Models for Question Answering | Nov 6, 2024 | Knowledge Graph EmbeddingsMultiple-choice | CodeCode Available | 1 |
| MILU: A Multi-task Indic Language Understanding Benchmark | Nov 4, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees | Nov 4, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Nov 4, 2024 | Caption GenerationMultiple-choice | CodeCode Available | 2 |