SOTAVerified

Image Captioning

Image Captioning is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence. The most popular benchmarks are nocaps and COCO, and models are typically evaluated according to a BLEU or CIDER metric.

( Image credit: Reflective Decoding Network for Image Captioning, ICCV'19)

Papers

Showing 110 of 1878 papers

Show:102550
← PrevPage 1 of 188Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IBM Research AICIDEr81.04Unverified
2CASIA_IVACIDEr77.7Unverified
3RUC_AIM3CIDEr72.72Unverified
4aburnsCIDEr59.61Unverified
5Team SSPCIDEr59.56Unverified
6NLP-685CIDEr56.19Unverified
7cs685-nlp-aoa-eCIDEr53.84Unverified
8paramboleCIDEr37.82Unverified
9Test533CIDEr37.18Unverified
10BUTD-TestCIDEr36.99Unverified