SOTAVerified

Caption Generation

Papers

Showing 110 of 310 papers

TitleStatusHype
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning0
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real WorldCode2
SonicVerse: Multi-Task Learning for Music Feature-Informed CaptioningCode2
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits0
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation0
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual FusionCode2
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks0
Show:102550
← PrevPage 1 of 31Next →

No leaderboard results yet.