| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 | 5 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale | Mar 13, 2024 | Constituency Grammar InductionLanguage Modeling | CodeCode Available | 2 | 5 |
| Generative Modeling for Mathematical Discovery | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer | Jun 3, 2024 | Audio GenerationIn-Context Learning | CodeCode Available | 2 | 5 |
| GenSim: A General Social Simulation Platform with Large Language Model based Agents | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels | Nov 25, 2022 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| DiffArtist: Towards Structure and Appearance Controllable Image Stylization | Jul 22, 2024 | DisentanglementImage Stylization | CodeCode Available | 2 | 5 |
| Stabilizing Transformer Training by Preventing Attention Entropy Collapse | Mar 11, 2023 | Automatic Speech Recognitionimage-classification | CodeCode Available | 2 | 5 |
| Generalized Interpolating Discrete Diffusion | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |