| HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation | Jul 17, 2025 | Reasoning SegmentationWorld Knowledge | —Unverified | 0 |
| Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes | Jul 17, 2025 | Common Sense ReasoningWorld Knowledge | —Unverified | 0 |
| KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection | Jul 13, 2025 | Fake News DetectionMisinformation | —Unverified | 0 |
| Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models | Jul 8, 2025 | Future predictionLarge Language Model | —Unverified | 0 |
| DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Jul 6, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| A Semi-supervised Scalable Unified Framework for E-commerce Query Classification | Jun 26, 2025 | ClassificationWorld Knowledge | —Unverified | 0 |
| From 2D to 3D Cognition: A Brief Survey of General World Models | Jun 25, 2025 | Autonomous DrivingScene Generation | —Unverified | 0 |
| MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations | Jun 25, 2025 | World Knowledge | CodeCode Available | 0 |
| Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment | Jun 24, 2025 | Informativenessreinforcement-learning | CodeCode Available | 0 |
| ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge | Jun 17, 2025 | BenchmarkingRetrieval | CodeCode Available | 0 |
| AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning | Jun 16, 2025 | Action GenerationAutonomous Driving | CodeCode Available | 3 |
| ConTextTab: A Semantics-Aware Tabular In-Context Learner | Jun 12, 2025 | In-Context LearningWorld Knowledge | CodeCode Available | 2 |
| MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning | Jun 12, 2025 | Image GenerationMultimodal Reasoning | —Unverified | 0 |
| RoCA: Robust Cross-Domain End-to-End Autonomous Driving | Jun 11, 2025 | Autonomous DrivingDomain Adaptation | —Unverified | 0 |
| Serendipitous Recommendation with Multimodal LLM | Jun 9, 2025 | Recommendation SystemsWorld Knowledge | —Unverified | 0 |
| ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving | Jun 9, 2025 | Autonomous DrivingImitation Learning | —Unverified | 0 |
| Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation | Jun 6, 2025 | Computational EfficiencyWorld Knowledge | —Unverified | 0 |
| Quantifying Cross-Modality Memorization in Vision-Language Models | Jun 5, 2025 | Machine UnlearningMemorization | —Unverified | 0 |
| TIIF-Bench: How Does Your T2I Model Follow Your Instructions? | Jun 2, 2025 | BenchmarkingInstruction Following | —Unverified | 0 |
| From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models | Jun 1, 2025 | World Knowledge | —Unverified | 0 |
| Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks | Jun 1, 2025 | In-Context LearningNegation | CodeCode Available | 0 |
| Augment or Not? A Comparative Study of Pure and Augmented Large Language Model Recommenders | May 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA | May 29, 2025 | Navigateparameter-efficient fine-tuning | —Unverified | 0 |
| MOVi: Training-free Text-conditioned Multi-Object Video Generation | May 29, 2025 | ObjectVideo Generation | —Unverified | 0 |
| Hierarchical Tree Search-based User Lifelong Behavior Modeling on Large Language Model | May 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning | May 25, 2025 | Out-of-Distribution Generalizationreinforcement-learning | —Unverified | 0 |
| DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving | May 25, 2025 | Autonomous DrivingImage Generation | —Unverified | 0 |
| Alchemist: Turning Public Text-to-Image Data into Generative Gold | May 25, 2025 | World Knowledge | —Unverified | 0 |
| GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains | May 24, 2025 | geo-localizationVisual Reasoning | CodeCode Available | 1 |
| Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation | May 24, 2025 | Image GenerationText to Image Generation | CodeCode Available | 0 |
| Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs? | May 23, 2025 | text-classificationText Classification | —Unverified | 0 |
| DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation | May 22, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| O^2-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering | May 22, 2025 | Answer GenerationOpen-Ended Question Answering | CodeCode Available | 1 |
| TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models | May 21, 2025 | Human AgingQuestion Answering | CodeCode Available | 0 |
| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 |
| UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models | May 21, 2025 | Machine UnlearningModel Editing | CodeCode Available | 0 |
| Table Foundation Models: on knowledge pre-training for tabular learning | May 20, 2025 | World Knowledge | —Unverified | 0 |
| Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection | May 18, 2025 | MemorizationWorld Knowledge | CodeCode Available | 0 |
| Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges | May 16, 2025 | BenchmarkingState Estimation | CodeCode Available | 0 |
| Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation | May 16, 2025 | General KnowledgeLarge Language Model | —Unverified | 0 |
| LODGE: Joint Hierarchical Task Planning and Learning of Domain Models with Grounded Execution | May 15, 2025 | Robot ManipulationTask Planning | —Unverified | 0 |
| LLM4CD: Leveraging Large Language Models for Open-World Knowledge Augmented Cognitive Diagnosis | May 14, 2025 | cognitive diagnosisWorld Knowledge | CodeCode Available | 0 |
| Enhancing Cache-Augmented Generation (CAG) with Adaptive Contextual Compression for Scalable Knowledge Integration | May 13, 2025 | RAGRetrieval | —Unverified | 0 |
| Advancing and Benchmarking Personalized Tool Invocation for LLMs | May 7, 2025 | BenchmarkingWorld Knowledge | CodeCode Available | 0 |
| Evaluating Contrastive Feedback for Effective User Simulations | May 5, 2025 | Information RetrievalPrompt Engineering | CodeCode Available | 0 |
| WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation | May 2, 2025 | Image GenerationText to Image Generation | —Unverified | 0 |
| Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers | Apr 29, 2025 | Data AugmentationKnowledge Graphs | —Unverified | 0 |
| Towards Automated Scoping of AI for Social Good Projects | Apr 28, 2025 | World Knowledge | —Unverified | 0 |
| Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models | Apr 27, 2025 | Visual ReasoningWorld Knowledge | —Unverified | 0 |
| WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion | Apr 18, 2025 | Contrastive LearningDenoising | CodeCode Available | 1 |