| Structured Object Language Modeling (SoLM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising | Nov 28, 2024 | DenoisingLanguage Modeling | —Unverified | 0 |
| HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Nov 26, 2024 | Language ModelingLarge Language Model | CodeCode Available | 2 |
| Functionality understanding and segmentation in 3D scenes | Nov 25, 2024 | AI AgentLanguage Modeling | —Unverified | 0 |
| PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making | Nov 24, 2024 | Decision MakingWorld Knowledge | —Unverified | 0 |
| GOT4Rec: Graph of Thoughts for Sequential Recommendation | Nov 22, 2024 | General KnowledgeSequential Recommendation | —Unverified | 0 |
| LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System | Nov 21, 2024 | Learning-To-RankPrompt Engineering | —Unverified | 0 |
| GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping | Nov 19, 2024 | Common Sense ReasoningHuman-Object Interaction Detection | —Unverified | 0 |
| Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task | Nov 11, 2024 | Activity RecognitionHuman Activity Recognition | —Unverified | 0 |
| Gradient Localization Improves Lifelong Pretraining of Language Models | Nov 7, 2024 | Continual LearningWorld Knowledge | —Unverified | 0 |
| Vision Language Models are In-Context Value Learners | Nov 7, 2024 | In-Context LearningWorld Knowledge | —Unverified | 0 |
| LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation | Nov 7, 2024 | Contrastive LearningImage Captioning | CodeCode Available | 4 |
| Pre-trained Visual Dynamics Representations for Efficient Policy Learning | Nov 5, 2024 | Reinforcement Learning (RL)Video Prediction | —Unverified | 0 |
| ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Nov 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| On the Exploration of LM-Based Soft Modular Robot Design | Nov 1, 2024 | World Knowledge | —Unverified | 0 |
| Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation | Nov 1, 2024 | EpidemiologyKnowledge Distillation | —Unverified | 0 |
| EMMA: End-to-End Multimodal Model for Autonomous Driving | Oct 30, 2024 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| GRADE: Quantifying Sample Diversity in Text-to-Image Models | Oct 29, 2024 | AttributeDiversity | —Unverified | 0 |
| ADAM: An Embodied Causal Agent in Open-World Environments | Oct 29, 2024 | Lifelong learningMinecraft | —Unverified | 0 |
| Learning and Unlearning of Fabricated Knowledge in Language Models | Oct 29, 2024 | Data PoisoningLanguage Modeling | —Unverified | 0 |
| ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval | Oct 24, 2024 | Image RetrievalRetrieval | CodeCode Available | 0 |
| All Entities are Not Created Equal: Examining the Long Tail for Fine-Grained Entity Typing | Oct 22, 2024 | AllEntity Typing | —Unverified | 0 |
| Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic | Oct 21, 2024 | Formal LogicWorld Knowledge | —Unverified | 0 |
| Roadmap towards Superhuman Speech Understanding using Large Language Models | Oct 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Understanding the Role of LLMs in Multimodal Evaluation Benchmarks | Oct 16, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| Comprehending Knowledge Graphs with Large Language Models for Recommender Systems | Oct 16, 2024 | Knowledge-Aware RecommendationKnowledge Graphs | —Unverified | 0 |
| KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities | Oct 15, 2024 | Image GenerationRetrieval | —Unverified | 0 |
| LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content | Oct 14, 2024 | Visual Question Answering (VQA)World Knowledge | CodeCode Available | 1 |
| DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities | Oct 10, 2024 | Document RankingEntity Embeddings | CodeCode Available | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 |
| LLM Embeddings Improve Test-time Adaptation to Tabular Y|X-Shifts | Oct 9, 2024 | Test-time AdaptationWorld Knowledge | CodeCode Available | 1 |
| Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance? | Oct 9, 2024 | In-Context LearningLogical Reasoning | CodeCode Available | 0 |
| SEAL: SEmantic-Augmented Imitation Learning via Language Model | Oct 3, 2024 | Decision MakingImitation Learning | —Unverified | 0 |
| Intent Detection in the Age of LLMs | Oct 2, 2024 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| "Why" Has the Least Side Effect on Model Editing | Sep 27, 2024 | Experimental Designknowledge editing | —Unverified | 0 |
| CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | Sep 27, 2024 | Reinforcement Learning (RL)World Knowledge | CodeCode Available | 1 |
| "Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models | Sep 27, 2024 | Interpretable Machine LearningWorld Knowledge | —Unverified | 0 |
| Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion | Sep 26, 2024 | Image GenerationIn-Context Learning | CodeCode Available | 0 |
| 60 Data Points are Sufficient to Fine-Tune LLMs for Question-Answering | Sep 24, 2024 | Question AnsweringWorld Knowledge | —Unverified | 0 |
| Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking | Sep 23, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models | Sep 22, 2024 | World Knowledge | —Unverified | 0 |
| The X Types -- Mapping the Semantics of the Twitter Sphere | Sep 22, 2024 | Type predictionWorld Knowledge | —Unverified | 0 |
| Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration | Sep 21, 2024 | Collision AvoidanceDecision Making | —Unverified | 0 |
| Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time | Sep 20, 2024 | BenchmarkingWorld Knowledge | —Unverified | 0 |
| HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | Sep 19, 2024 | Large Language ModelRecommendation Systems | CodeCode Available | 4 |
| Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Sep 17, 2024 | Active LearningDiversity | CodeCode Available | 1 |
| Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark | Sep 13, 2024 | Sequential Decision MakingWorld Knowledge | —Unverified | 0 |
| Synthetic continued pretraining | Sep 11, 2024 | Data AugmentationLanguage Modelling | CodeCode Available | 2 |
| Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Sep 10, 2024 | Autonomous VehiclesLanguage Modeling | —Unverified | 0 |
| Can OOD Object Detectors Learn from Foundation Models? | Sep 8, 2024 | Objectobject-detection | CodeCode Available | 1 |