| DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization | Jul 17, 2025 | Descriptive | —Unverified | 0 |
| Assay2Mol: large language model-based drug design using BioAssay context | Jul 16, 2025 | DescriptiveDrug Design | CodeCode Available | 0 |
| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation | Jul 9, 2025 | DescriptiveText Generation | —Unverified | 0 |
| Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor | Jul 4, 2025 | Descriptiveimage-classification | CodeCode Available | 0 |
| Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization | Jul 3, 2025 | DescriptiveDisentanglement | —Unverified | 0 |
| Dataset Distillation via Vision-Language Category Prototype | Jun 30, 2025 | Dataset DistillationDescriptive | CodeCode Available | 1 |
| Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization | Jun 25, 2025 | Dense Video CaptioningDescriptive | —Unverified | 0 |
| Experiential marketing strategy and tourism demand in the contribution of the positioning of the floating islands Los Uros, Puno | Jun 22, 2025 | DescriptiveMarketing | —Unverified | 0 |
| DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving | Jun 21, 2025 | Autonomous DrivingDescriptive | CodeCode Available | 1 |
| A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation | Jun 20, 2025 | Contrastive LearningDescriptive | —Unverified | 0 |
| InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems | Jun 19, 2025 | BenchmarkingDescriptive | CodeCode Available | 1 |
| Uncovering Intention through LLM-Driven Code Snippet Description Generation | Jun 18, 2025 | Descriptive | —Unverified | 0 |
| SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning | Jun 18, 2025 | Caption GenerationDescriptive | CodeCode Available | 2 |
| A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation | Jun 16, 2025 | Content-Based Image RetrievalDescriptive | —Unverified | 0 |
| Evolvable Conditional Diffusion | Jun 16, 2025 | DenoisingDescriptive | —Unverified | 0 |
| Rethinking Optimization: A Systems-Based Approach to Social Externalities | Jun 15, 2025 | Descriptive | —Unverified | 0 |
| Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables | Jun 13, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| CoLMbo: Speaker Language Model for Descriptive Profiling | Jun 11, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 0 |
| Alice and the Caterpillar: A more descriptive null model for assessing data mining results | Jun 11, 2025 | Descriptive | CodeCode Available | 0 |
| ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model | Jun 11, 2025 | cross-modal alignmentDescriptive | CodeCode Available | 2 |
| CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models | Jun 11, 2025 | counterfactualDescriptive | CodeCode Available | 2 |
| ARGUS: Hallucination and Omission Evaluation in Video-LLMs | Jun 9, 2025 | DescriptiveForm | —Unverified | 0 |
| ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models | Jun 9, 2025 | Descriptive | —Unverified | 0 |
| The Influence of Tourist Experience on Revisit Decisions with the Mediation of Tourist Satisfaction | Jun 6, 2025 | DescriptiveMarketing | —Unverified | 0 |
| PRJ: Perception-Retrieval-Judgement for Generated Images | Jun 4, 2025 | DescriptiveRetrieval | —Unverified | 0 |
| Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability | Jun 2, 2025 | DescriptiveSynthetic Data Generation | CodeCode Available | 1 |
| Protein folding classes -- High-dimensional geometry of amino acid composition space revisited | Jun 2, 2025 | DescriptiveProtein Folding | —Unverified | 0 |
| Effect of Insecurity on Agricultural Output in Benue State, Nigeria | Jun 2, 2025 | Descriptive | —Unverified | 0 |
| Ultra-High-Resolution Image Synthesis: Data, Method and Evaluation | Jun 2, 2025 | 4kDescriptive | CodeCode Available | 3 |
| NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization | May 30, 2025 | DescriptiveForm | —Unverified | 0 |
| Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnostic information from clinical CMR imaging reports | May 29, 2025 | DescriptiveDiagnostic | —Unverified | 0 |
| VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning | May 29, 2025 | Anomaly DetectionDescriptive | CodeCode Available | 2 |
| LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization | May 29, 2025 | DescriptiveVector Graphics | —Unverified | 0 |
| NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID | May 26, 2025 | AttributeCaption Generation | —Unverified | 0 |
| BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks | May 24, 2025 | Activity RecognitionDescriptive | —Unverified | 0 |
| Contrastive Distillation of Emotion Knowledge from LLMs for Zero-Shot Emotion Recognition | May 23, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 0 |
| Creatively Upscaling Images with Global-Regional Priors | May 22, 2025 | DenoisingDescriptive | —Unverified | 0 |
| CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation | May 22, 2025 | AttributeDescriptive | —Unverified | 0 |
| GitHub Repository Complexity Leads to Diminished Web Archive Availability | May 21, 2025 | Descriptive | —Unverified | 0 |
| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 |
| Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models | May 20, 2025 | Anomaly DetectionDescriptive | —Unverified | 0 |
| Descriptive Image-Text Matching with Graded Contextual Similarity | May 15, 2025 | DescriptiveImage-text matching | —Unverified | 0 |
| The Human-Data-Model Interaction Canvas for Visual Analytics | May 12, 2025 | Descriptive | —Unverified | 0 |
| Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models | May 11, 2025 | DescriptiveDiagnostic | CodeCode Available | 1 |
| Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration | May 11, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 |
| KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery | May 9, 2025 | ClusteringDescriptive | CodeCode Available | 0 |
| SweRank: Software Issue Localization with Code Ranking | May 7, 2025 | Descriptive | —Unverified | 0 |
| Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model | May 7, 2025 | Data AugmentationDescriptive | —Unverified | 0 |