SOTAVerified

Image Paragraph Captioning

Image paragraph captioning involves generating a detailed, multi-sentence description of the content of an image.

Papers

Showing 110 of 17 papers

TitleStatusHype
VLIS: Unimodal Language Models Guide Multimodal Language GenerationCode1
Enhancing image captioning with depth information using a Transformer-based framework0
Bypass Network for Semantics Driven Image Paragraph Captioning0
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning0
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph CaptioningCode0
When an Image Tells a Story: The Role of Visual and Semantic Information for Generating Paragraph Descriptions0
Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning0
Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning0
Improving Diversity and Reducing Redundancy in Paragraph Captions0
Dual-CNN: A Convolutional language decoder for paragraph image captioning0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1HSGED(SLL)BLEU-411.26Unverified
2SCST training, w/ rep. penaltyBLEU-410.58Unverified
3IMAPBLEU-410.29Unverified
4CAE-LSTMBLEU-49.67Unverified
5Diverse and Coherent Paragraph Generation from ImagesBLEU-49.43Unverified
6RTT-GAN (Semi + Fully)BLEU-49.21Unverified
7Regions-Hierarchical (ours)BLEU-48.69Unverified
8Dual-CNNBLEU-48.6Unverified
9Depth-aware Attention Model (DAM)BLEU-46.7Unverified
10IMG+LNGBLEU-44.67Unverified