SOTAVerified

Caption Generation

Papers

Showing 51100 of 310 papers

TitleStatusHype
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using PatchesCode1
Croc: Pretraining Large Multimodal Models with Cross-Modal ComprehensionCode1
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance SegmentationCode1
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic DataCode1
TAP: Text-Aware Pre-training for Text-VQA and Text-CaptionCode1
Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningCode1
RECAP: Retrieval-Augmented Audio CaptioningCode1
Deep Reinforcement Learning For Sequence to Sequence ModelsCode1
Bivariate Beta-LSTMCode0
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMsCode0
DeepDiary: Automatic Caption Generation for Lifelogging Image StreamsCode0
Event and Entity Extraction from Generated Video CaptionsCode0
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure CaptioningCode0
Scalable Bayesian Optimization Using Deep Neural NetworksCode0
Sequence to Sequence -- Video to TextCode0
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text SummarizationCode0
Referring Expression Object Segmentation with Caption-Aware ConsistencyCode0
Regularizing RNNs for Caption Generation by Reconstructing The Past with The PresentCode0
An Empirical Study of Language CNN for Image CaptioningCode0
Recurrent Neural Network RegularizationCode0
Cortico-cerebellar networks as decoupling neural interfacesCode0
Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Expertized Caption Auto-Enhancement for Video-Text RetrievalCode0
Dual-path Collaborative Generation Network for Emotional Video CaptioningCode0
Pre-gen metrics: Predicting caption quality metrics without generating captionsCode0
R^3Net:Relation-embedded Representation Reconstruction Network for Change CaptioningCode0
Multi-source weak supervision for saliency detectionCode0
Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder NetworkCode0
Multimodal Preference Data Synthetic Alignment with Reward ModelCode0
Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target TokensCode0
Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERTCode0
An Actor-Critic Algorithm for Sequence PredictionCode0
Multi-LLM Collaborative Caption Generation in Scientific DocumentsCode0
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGSCode0
Compositional Generalization in Image CaptioningCode0
Memeify: A Large-Scale Meme Generation SystemCode0
Efficient Urdu Caption Generation using Attention based LSTMCode0
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing ImagesCode0
Comparative evaluation of CNN architectures for Image Caption GenerationCode0
Local Information Assisted Attention-free Decoder for Audio CaptioningCode0
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkCode0
Evaluating and interpreting caption prediction for histopathology imagesCode0
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal FusionCode0
DSD: Dense-Sparse-Dense Training for Deep Neural NetworksCode0
CNN Fixations: An unraveling approach to visualize the discriminative image regionsCode0
Journalistic Guidelines Aware News Image CaptioningCode0
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningCode0
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does MatterCode0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.