| Aligned with LLM: a new multi-modal training paradigm for encoding fMRI activity in visual cortex | Jan 8, 2024 | Descriptive | —Unverified | 0 |
| Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training | Jan 4, 2024 | DescriptiveImage Captioning | CodeCode Available | 1 |
| Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition | Jan 3, 2024 | DescriptiveLanguage Modeling | —Unverified | 0 |
| BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving | Jan 2, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| VideoStudio: Generating Consistent-Content and Multi-Scene Videos | Jan 2, 2024 | DescriptiveVideo Generation | CodeCode Available | 1 |
| Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation | Jan 1, 2024 | DescriptiveObject | CodeCode Available | 2 |
| SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation | Jan 1, 2024 | Descriptivepoint cloud upsampling | CodeCode Available | 1 |
| Contribución de la semántica combinatoria al desarrollo de herramientas digitales multilingües | Dec 26, 2023 | Descriptive | —Unverified | 0 |
| Fast kernel half-space depth for data with non-convex supports | Dec 21, 2023 | Anomaly DetectionDescriptive | —Unverified | 0 |
| Multi-Sentence Grounding for Long-term Instructional Video | Dec 21, 2023 | DenoisingDescriptive | —Unverified | 0 |