SOTAVerified

Caption Generation

Papers

Showing 51100 of 310 papers

TitleStatusHype
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using PatchesCode1
Croc: Pretraining Large Multimodal Models with Cross-Modal ComprehensionCode1
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance SegmentationCode1
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change CaptioningCode1
TAP: Text-Aware Pre-training for Text-VQA and Text-CaptionCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object LocalizationCode1
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
Deep Reinforcement Learning For Sequence to Sequence ModelsCode1
Bivariate Beta-LSTMCode0
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text SummarizationCode0
DeepDiary: Automatic Caption Generation for Lifelogging Image StreamsCode0
Event and Entity Extraction from Generated Video CaptionsCode0
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure CaptioningCode0
Scalable Bayesian Optimization Using Deep Neural NetworksCode0
Sequence to Sequence -- Video to TextCode0
Referring Expression Object Segmentation with Caption-Aware ConsistencyCode0
Regularizing RNNs for Caption Generation by Reconstructing The Past with The PresentCode0
An Empirical Study of Language CNN for Image CaptioningCode0
Recurrent Neural Network RegularizationCode0
Cortico-cerebellar networks as decoupling neural interfacesCode0
Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Expertized Caption Auto-Enhancement for Video-Text RetrievalCode0
Dual-path Collaborative Generation Network for Emotional Video CaptioningCode0
Pre-gen metrics: Predicting caption quality metrics without generating captionsCode0
R^3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Multi-source weak supervision for saliency detectionCode0
Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder NetworkCode0
Multimodal Preference Data Synthetic Alignment with Reward ModelCode0
Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target TokensCode0
Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERTCode0
An Actor-Critic Algorithm for Sequence PredictionCode0
Multi-LLM Collaborative Caption Generation in Scientific DocumentsCode0
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGSCode0
Compositional Generalization in Image CaptioningCode0
Memeify: A Large-Scale Meme Generation SystemCode0
Efficient Urdu Caption Generation using Attention based LSTMCode0
Comparative evaluation of CNN architectures for Image Caption GenerationCode0
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing ImagesCode0
Local Information Assisted Attention-free Decoder for Audio CaptioningCode0
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal FusionCode0
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkCode0
Evaluating and interpreting caption prediction for histopathology imagesCode0
DSD: Dense-Sparse-Dense Training for Deep Neural NetworksCode0
CNN Fixations: An unraveling approach to visualize the discriminative image regionsCode0
Journalistic Guidelines Aware News Image CaptioningCode0
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningCode0
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does MatterCode0
Image Caption Generation for News ArticlesCode0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.