SOTAVerified

Image Description

Papers

Showing 51100 of 154 papers

TitleStatusHype
Multi30K: Multilingual English-German Image DescriptionsCode0
Multilingual Image Description with Neural Sequence ModelsCode0
Multimodal Word Sense Disambiguation in Creative PracticeCode0
On Architectures for Including Visual Information in Neural Language Models for Image DescriptionCode0
Pragmatic factors in image description: the case of negationsCode0
Room for improvement in automatic image description: an error analysisCode0
RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human FeedbackCode0
Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional NetworkCode0
Talking about other people: an endless range of possibilitiesCode0
The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image ClassificationCode0
Unsupervised Image CaptioningCode0
Unsupervised Visual Sense Disambiguation for Verbs using Multimodal EmbeddingsCode0
Varying image description tasks: spoken versus written descriptionsCode0
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language ModelsCode0
What a neural language model tells us about spatial relationsCode0
A Fine-Grained Image Description Generation Method Based on Joint Objectives0
Collecting Image Description Datasets using Crowdsourcing0
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation0
Adaptive Color Attributes for Real-Time Visual Tracking0
Tell Me More: A Dataset of Visual Scene Description Sequences0
Im2Text: Describing Images Using 1 Million Captioned Photographs0
Image Description Dataset for Language Learners0
Image Description using Visual Dependency Representations0
Image Pivoting for Learning Multilingual Multimodal Representations0
Boli: A dataset for understanding stuttering experience and analyzing stuttered speech0
Impressions: Understanding Visual Semiotics and Aesthetic Impact0
Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments0
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models0
LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning0
Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images0
Textual Visual Semantic Dataset for Text Spotting0
Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data0
Local Higher-Order Statistics (LHS) describing images with statistics of local non-binarized pixel patterns0
Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism0
The Image Torque Operator for Contour Processing0
The Lexical Gap: An Improved Measure of Automated Image Description Quality0
The Long-Short Story of Movie Description0
The Task Matters: Comparing Image Captioning and Task-Based Dialogical Image Description0
A Genetic Algorithm Approach for ImageRepresentation Learning through Color Quantization0
Mind's Eye: A Recurrent Visual Representation for Image Caption Generation0
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures0
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching0
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description0
Data-augmented phrase-level alignment for mitigating object hallucination0
Adding the Third Dimension to Spatial Relation Detection in 2D Images0
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset0
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models0
Multimodal fusion via cortical network inspired losses0
Multi-modal gated recurrent units for image description0
Multimodal Machine Translation with Reinforcement Learning0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.