SOTAVerified

Descriptive

Papers

Showing 150 of 1477 papers

TitleStatusHype
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization0
Assay2Mol: large language model-based drug design using BioAssay contextCode0
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation0
Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual DescriptorCode0
Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization0
Dataset Distillation via Vision-Language Category PrototypeCode1
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization0
Experiential marketing strategy and tourism demand in the contribution of the positioning of the floating islands Los Uros, Puno0
DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For DrivingCode1
A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation0
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech SystemsCode1
Uncovering Intention through LLM-Driven Code Snippet Description Generation0
SonicVerse: Multi-Task Learning for Music Feature-Informed CaptioningCode2
A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation0
Evolvable Conditional Diffusion0
Rethinking Optimization: A Systems-Based Approach to Social Externalities0
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables0
CoLMbo: Speaker Language Model for Descriptive ProfilingCode0
Alice and the Caterpillar: A more descriptive null model for assessing data mining resultsCode0
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single ModelCode2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
ARGUS: Hallucination and Omission Evaluation in Video-LLMs0
ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models0
The Influence of Tourist Experience on Revisit Decisions with the Mediation of Tourist Satisfaction0
PRJ: Perception-Retrieval-Judgement for Generated Images0
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and AccountabilityCode1
Protein folding classes -- High-dimensional geometry of amino acid composition space revisited0
Effect of Insecurity on Agricultural Output in Benue State, Nigeria0
Ultra-High-Resolution Image Synthesis: Data, Method and EvaluationCode3
NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization0
Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnostic information from clinical CMR imaging reports0
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-TuningCode2
LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization0
NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID0
BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks0
Contrastive Distillation of Emotion Knowledge from LLMs for Zero-Shot Emotion RecognitionCode0
Creatively Upscaling Images with Global-Regional Priors0
CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation0
GitHub Repository Complexity Leads to Diminished Web Archive Availability0
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets0
Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models0
Descriptive Image-Text Matching with Graded Contextual Similarity0
The Human-Data-Model Interaction Canvas for Visual Analytics0
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language ModelsCode1
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration0
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language UnderstandingCode1
KCluster: An LLM-based Clustering Approach to Knowledge Component DiscoveryCode0
SweRank: Software Issue Localization with Code Ranking0
Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model0
Show:102550
← PrevPage 1 of 30Next →

No leaderboard results yet.