| KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities | Oct 15, 2024 | Image GenerationRetrieval | —Unverified | 0 |
| LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content | Oct 14, 2024 | Visual Question Answering (VQA)World Knowledge | CodeCode Available | 1 |
| DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities | Oct 10, 2024 | Document RankingEntity Embeddings | CodeCode Available | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 |
| LLM Embeddings Improve Test-time Adaptation to Tabular Y|X-Shifts | Oct 9, 2024 | Test-time AdaptationWorld Knowledge | CodeCode Available | 1 |
| Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance? | Oct 9, 2024 | In-Context LearningLogical Reasoning | CodeCode Available | 0 |
| SEAL: SEmantic-Augmented Imitation Learning via Language Model | Oct 3, 2024 | Decision MakingImitation Learning | —Unverified | 0 |
| Intent Detection in the Age of LLMs | Oct 2, 2024 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| "Why" Has the Least Side Effect on Model Editing | Sep 27, 2024 | Experimental Designknowledge editing | —Unverified | 0 |
| CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | Sep 27, 2024 | Reinforcement Learning (RL)World Knowledge | CodeCode Available | 1 |
| "Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models | Sep 27, 2024 | Interpretable Machine LearningWorld Knowledge | —Unverified | 0 |
| Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion | Sep 26, 2024 | Image GenerationIn-Context Learning | CodeCode Available | 0 |
| 60 Data Points are Sufficient to Fine-Tune LLMs for Question-Answering | Sep 24, 2024 | Question AnsweringWorld Knowledge | —Unverified | 0 |
| Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking | Sep 23, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models | Sep 22, 2024 | World Knowledge | —Unverified | 0 |
| The X Types -- Mapping the Semantics of the Twitter Sphere | Sep 22, 2024 | Type predictionWorld Knowledge | —Unverified | 0 |
| Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration | Sep 21, 2024 | Collision AvoidanceDecision Making | —Unverified | 0 |
| Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time | Sep 20, 2024 | BenchmarkingWorld Knowledge | —Unverified | 0 |
| HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | Sep 19, 2024 | Large Language ModelRecommendation Systems | CodeCode Available | 4 |
| Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Sep 17, 2024 | Active LearningDiversity | CodeCode Available | 1 |
| Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark | Sep 13, 2024 | Sequential Decision MakingWorld Knowledge | —Unverified | 0 |
| Synthetic continued pretraining | Sep 11, 2024 | Data AugmentationLanguage Modelling | CodeCode Available | 2 |
| Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Sep 10, 2024 | Autonomous VehiclesLanguage Modeling | —Unverified | 0 |
| Can OOD Object Detectors Learn from Foundation Models? | Sep 8, 2024 | Objectobject-detection | CodeCode Available | 1 |