| All Entities are Not Created Equal: Examining the Long Tail for Fine-Grained Entity Typing | Oct 22, 2024 | AllEntity Typing | —Unverified | 0 |
| Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic | Oct 21, 2024 | Formal LogicWorld Knowledge | —Unverified | 0 |
| Roadmap towards Superhuman Speech Understanding using Large Language Models | Oct 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Understanding the Role of LLMs in Multimodal Evaluation Benchmarks | Oct 16, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| Comprehending Knowledge Graphs with Large Language Models for Recommender Systems | Oct 16, 2024 | Knowledge-Aware RecommendationKnowledge Graphs | —Unverified | 0 |
| KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities | Oct 15, 2024 | Image GenerationRetrieval | —Unverified | 0 |
| LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content | Oct 14, 2024 | Visual Question Answering (VQA)World Knowledge | CodeCode Available | 1 |
| DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities | Oct 10, 2024 | Document RankingEntity Embeddings | CodeCode Available | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 |
| LLM Embeddings Improve Test-time Adaptation to Tabular Y|X-Shifts | Oct 9, 2024 | Test-time AdaptationWorld Knowledge | CodeCode Available | 1 |