| DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization | Jul 17, 2025 | Descriptive | —Unverified | 0 |
| Assay2Mol: large language model-based drug design using BioAssay context | Jul 16, 2025 | DescriptiveDrug Design | CodeCode Available | 0 |
| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation | Jul 9, 2025 | DescriptiveText Generation | —Unverified | 0 |
| Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor | Jul 4, 2025 | Descriptiveimage-classification | CodeCode Available | 0 |
| Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization | Jul 3, 2025 | DescriptiveDisentanglement | —Unverified | 0 |
| Dataset Distillation via Vision-Language Category Prototype | Jun 30, 2025 | Dataset DistillationDescriptive | CodeCode Available | 1 |
| Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization | Jun 25, 2025 | Dense Video CaptioningDescriptive | —Unverified | 0 |
| Experiential marketing strategy and tourism demand in the contribution of the positioning of the floating islands Los Uros, Puno | Jun 22, 2025 | DescriptiveMarketing | —Unverified | 0 |
| DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving | Jun 21, 2025 | Autonomous DrivingDescriptive | CodeCode Available | 1 |
| A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation | Jun 20, 2025 | Contrastive LearningDescriptive | —Unverified | 0 |
| InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems | Jun 19, 2025 | BenchmarkingDescriptive | CodeCode Available | 1 |
| SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning | Jun 18, 2025 | Caption GenerationDescriptive | CodeCode Available | 2 |
| Uncovering Intention through LLM-Driven Code Snippet Description Generation | Jun 18, 2025 | Descriptive | —Unverified | 0 |
| A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation | Jun 16, 2025 | Content-Based Image RetrievalDescriptive | —Unverified | 0 |
| Evolvable Conditional Diffusion | Jun 16, 2025 | DenoisingDescriptive | —Unverified | 0 |
| Rethinking Optimization: A Systems-Based Approach to Social Externalities | Jun 15, 2025 | Descriptive | —Unverified | 0 |
| Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables | Jun 13, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| Alice and the Caterpillar: A more descriptive null model for assessing data mining results | Jun 11, 2025 | Descriptive | CodeCode Available | 0 |
| CoLMbo: Speaker Language Model for Descriptive Profiling | Jun 11, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 0 |
| ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model | Jun 11, 2025 | cross-modal alignmentDescriptive | CodeCode Available | 2 |
| CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models | Jun 11, 2025 | counterfactualDescriptive | CodeCode Available | 2 |
| ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models | Jun 9, 2025 | Descriptive | —Unverified | 0 |
| ARGUS: Hallucination and Omission Evaluation in Video-LLMs | Jun 9, 2025 | DescriptiveForm | —Unverified | 0 |
| The Influence of Tourist Experience on Revisit Decisions with the Mediation of Tourist Satisfaction | Jun 6, 2025 | DescriptiveMarketing | —Unverified | 0 |