| Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions | Nov 28, 2023 | DisentanglementReferring Expression | CodeCode Available | 1 |
| DUnE: Dataset for Unified Editing | Nov 27, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MSCMNet: Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification | Nov 24, 2023 | Person Re-IdentificationTriplet | CodeCode Available | 1 |
| Neural-Logic Human-Object Interaction Detection | Nov 16, 2023 | DecoderHuman-Object Interaction Detection | CodeCode Available | 1 |
| Mirror: A Universal Framework for Various Information Extraction Tasks | Nov 9, 2023 | Machine Reading ComprehensionReading Comprehension | CodeCode Available | 1 |
| InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image | Nov 6, 2023 | NeRFTriplet | CodeCode Available | 1 |
| Large Language Models are Temporal and Causal Reasoners for Video Question Answering | Oct 24, 2023 | Natural Language UnderstandingQuestion Answering | CodeCode Available | 1 |
| CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction | Oct 24, 2023 | Aspect Sentiment Triplet ExtractionAspect Term Extraction and Sentiment Classification | CodeCode Available | 1 |
| LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation | Oct 16, 2023 | Few-Shot LearningLarge Language Model | CodeCode Available | 1 |
| ESA: External Space Attention Aggregation for Image-Text Retrieval | Oct 10, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 |