SOTAVerified

Caption Generation

Papers

Showing 126150 of 310 papers

TitleStatusHype
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism0
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols0
Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning0
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation0
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools0
ViPE: Visualise Pretty-much EverythingCode0
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation0
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images0
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning0
ViCo: Engaging Video Comment Generation with Human Preference Rewards0
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human FeedbackCode0
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes0
Multi-Similarity Contrastive Learning0
Knowledge Distillation for Efficient Audio-Visual Video Captioning0
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure CaptioningCode0
CapText: Large Language Model-based Caption Generation From Image Context and Description0
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment0
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning0
DiffCap: Exploring Continuous Diffusion on Image Captioning0
Efficient Audio Captioning Transformer with Patchout and Text Guidance0
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models0
Multi-modal reward for visual relationships-based image captioning0
GNNFormer: A Graph-based Framework for Cytopathology Report Generation0
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text SummarizationCode0
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning0
Show:102550
← PrevPage 6 of 13Next →

No leaderboard results yet.