| A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding | Jul 9, 2025 | 3D visual groundingAutonomous Navigation | —Unverified | 0 |
| A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms | Jul 9, 2025 | Movie RecommendationRecommendation Systems | —Unverified | 0 |
| What Demands Attention in Urban Street Scenes? From Scene Understanding towards Road Safety: A Survey of Vision-driven Datasets and Studies | Jul 9, 2025 | Scene UnderstandingSurvey | —Unverified | 0 |
| 4KAgent: Agentic Any Image to 4K Super-Resolution | Jul 9, 2025 | 4kImage Quality Assessment | —Unverified | 0 |
| OpenDPDv2: A Unified Learning and Optimization Framework for Neural Network Digital Predistortion | Jul 9, 2025 | Model OptimizationQuantization | —Unverified | 0 |
| SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers | Jul 9, 2025 | | CodeCode Available | 0 |
| Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights | Jul 9, 2025 | DiagnosticMedical Visual Question Answering | —Unverified | 0 |
| Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning | Jul 9, 2025 | Recommendation Systems | CodeCode Available | 0 |
| Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts | Jul 9, 2025 | Continual LearningIncremental Learning | CodeCode Available | 0 |
| GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models | Jul 9, 2025 | Recommendation SystemsSurvey | —Unverified | 0 |
| Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey | Jul 9, 2025 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation | Jul 9, 2025 | DescriptiveText Generation | —Unverified | 0 |
| What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models | Jul 9, 2025 | Inductive Bias | —Unverified | 0 |
| Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings | Jul 9, 2025 | Contrastive LearningMachine Translation | —Unverified | 0 |
| MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection | Jul 9, 2025 | | CodeCode Available | 0 |
| Foundation models for time series forecasting: Application in conformal prediction | Jul 9, 2025 | Conformal PredictionPrediction | —Unverified | 0 |
| Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework | Jul 9, 2025 | Collaborative Inference | —Unverified | 0 |
| Learning from Sparse Point Labels for Dense Carcinosis Localization in Advanced Ovarian Cancer Assessment | Jul 9, 2025 | Diagnostic | —Unverified | 0 |
| A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality | Jul 9, 2025 | DiversityVideo Generation | —Unverified | 0 |
| MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Jul 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Jul 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reading a Ruler in the Wild | Jul 9, 2025 | Keypoint Detection | —Unverified | 0 |
| Temporal Information Retrieval via Time-Specifier Model Merging | Jul 9, 2025 | Information Retrievalmodel | CodeCode Available | 0 |
| Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data | Jul 9, 2025 | Motion GenerationZero-shot Generalization | CodeCode Available | 0 |
| MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning | Jul 9, 2025 | Keypoint DetectionPose Estimation | CodeCode Available | 0 |
| ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture | Jul 9, 2025 | Motion ForecastingTrajectory Prediction | CodeCode Available | 0 |
| CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs | Jul 9, 2025 | ChunkingRAG | —Unverified | 0 |
| Failure Forecasting Boosts Robustness of Sim2Real Rhythmic Insertion Policies | Jul 9, 2025 | FrictionPose Tracking | —Unverified | 0 |
| Evaluating Attribute Confusion in Fashion Text-to-Image Generation | Jul 9, 2025 | Attributecross-modal alignment | —Unverified | 0 |
| LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation | Jul 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset | Jul 9, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery | Jul 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior | Jul 9, 2025 | Decision Making | CodeCode Available | 0 |
| Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection | Jul 9, 2025 | Human-Object Interaction DetectionLarge Language Model | CodeCode Available | 0 |
| Integrating External Tools with Large Language Models to Improve Accuracy | Jul 9, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 |
| Residual Prior-driven Frequency-aware Network for Image Fusion | Jul 9, 2025 | SSIM | CodeCode Available | 0 |
| Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation | Jul 9, 2025 | Machine TranslationSign Language Translation | —Unverified | 0 |
| Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices | Jul 9, 2025 | Boundary DetectionOptical Character Recognition (OCR) | —Unverified | 0 |
| Fast Gaussian Processes under Monotonicity Constraints | Jul 9, 2025 | Computational EfficiencyGaussian Processes | —Unverified | 0 |
| MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Jul 9, 2025 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 |
| Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning | Jul 9, 2025 | GPUMulti-agent Reinforcement Learning | —Unverified | 0 |
| MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval | Jul 9, 2025 | DiversityImage Retrieval | CodeCode Available | 0 |
| From large-eddy simulations to deep learning: A U-net model for fast urban canopy flow predictions | Jul 9, 2025 | GPUL2 Regularization | CodeCode Available | 0 |
| GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | Jul 9, 2025 | Caption GenerationClustering | —Unverified | 0 |
| The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models | Jul 8, 2025 | | CodeCode Available | 0 |
| Gradients as an Action: Towards Communication-Efficient Federated Recommender Systems via Adaptive Action Sharing | Jul 8, 2025 | | CodeCode Available | 0 |
| SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? | Jul 8, 2025 | | —Unverified | 0 |
| HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars | Jul 8, 2025 | | —Unverified | 0 |
| Multi-Sense Embeddings for Language Models and Knowledge Distillation | Jul 8, 2025 | | CodeCode Available | 0 |
| CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings | Jul 8, 2025 | | CodeCode Available | 0 |