| DiffArtist: Towards Structure and Appearance Controllable Image Stylization | Jul 22, 2024 | DisentanglementImage Stylization | CodeCode Available | 2 | 5 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 | 5 |
| Concept Bottleneck Language Models For protein design | Nov 9, 2024 | Decision MakingDrug Discovery | CodeCode Available | 2 | 5 |
| GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | May 30, 2025 | ClassificationDisaster Response | CodeCode Available | 2 | 5 |
| EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning | Jan 31, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 | 5 |
| GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering | Feb 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Compression Represents Intelligence Linearly | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 | 5 |
| GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model | Jun 3, 2024 | geo-localizationLanguage Modeling | CodeCode Available | 2 | 5 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |