| Early Signs of Steganographic Capabilities in Frontier LLMs | Jul 3, 2025 | Large Language Model | CodeCode Available | 0 |
| OpenTable-R1: A Reinforcement Learning Augmented Tool Agent for Open-Domain Table Question Answering | Jul 2, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs | Jul 1, 2025 | Large Language Model | CodeCode Available | 1 |
| Dataset Distillation via Vision-Language Category Prototype | Jun 30, 2025 | Dataset DistillationDescriptive | CodeCode Available | 1 |
| Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | Jun 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent | Jun 30, 2025 | Interactive RecommendationLarge Language Model | CodeCode Available | 0 |
| Where, What, Why: Towards Explainable Driver Attention Prediction | Jun 29, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder | Jun 28, 2025 | Image SegmentationLarge Language Model | CodeCode Available | 1 |
| Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval | Jun 28, 2025 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 |
| ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment | Jun 28, 2025 | Dynamic Time WarpingLarge Language Model | CodeCode Available | 0 |
| A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis | Jun 27, 2025 | Code GenerationLanguage Modeling | —Unverified | 0 |
| ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation | Jun 27, 2025 | Large Language ModelNatural Language Inference | —Unverified | 0 |
| Large Language Model Agent for Modular Task Execution in Drug Discovery | Jun 26, 2025 | Drug DiscoveryLanguage Modeling | —Unverified | 0 |
| AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text | Jun 26, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 0 |
| Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models | Jun 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MT2-CSD: A New Dataset and Multi-Semantic Knowledge Fusion Method for Conversational Stance Detection | Jun 26, 2025 | Large Language ModelOpinion Mining | —Unverified | 0 |
| mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Jun 26, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis | Jun 26, 2025 | Explainable Artificial Intelligence (XAI)Interpretable Machine Learning | —Unverified | 0 |
| Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Jun 26, 2025 | Code GenerationLarge Language Model | —Unverified | 0 |
| Multimodal Prompt Alignment for Facial Expression Recognition | Jun 26, 2025 | Facial Expression RecognitionFacial Expression Recognition (FER) | —Unverified | 0 |
| MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification | Jun 26, 2025 | Image SegmentationLarge Language Model | —Unverified | 0 |
| HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Jun 26, 2025 | Large Language ModelMultimodal Reasoning | CodeCode Available | 2 |
| GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding | Jun 26, 2025 | 3D visual groundingLarge Language Model | —Unverified | 0 |
| OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography | Jun 26, 2025 | DeciphermentLarge Language Model | CodeCode Available | 0 |
| ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing | Jun 26, 2025 | Audio GenerationLarge Language Model | CodeCode Available | 5 |