Language-Enhanced Representation Learning for Single-Cell Transcriptomics

2025-03-12Code Available0· sign in to hype

Yaorui Shi, Jiaqi Yang, Changhao Nai, Sihang Li, Junfeng Fang, Xiang Wang, Zhiyuan Liu, Yang Zhang

Code Available — Be the first to reproduce this paper.

Code

github.com/syr-cn/scmmgpt
OfficialIn paperpytorch★ 9

Abstract

Single-cell RNA sequencing (scRNA-seq) offers detailed insights into cellular heterogeneity. Recent advancements leverage single-cell large language models (scLLMs) for effective representation learning. These models focus exclusively on transcriptomic data, neglecting complementary biological knowledge from textual descriptions. To overcome this limitation, we propose scMMGPT, a novel multimodal framework designed for language-enhanced representation learning in single-cell transcriptomics. Unlike existing methods, scMMGPT employs robust cell representation extraction, preserving quantitative gene expression data, and introduces an innovative two-stage pre-training strategy combining discriminative precision with generative flexibility. Extensive experiments demonstrate that scMMGPT significantly outperforms unimodal and multimodal baselines across key downstream tasks, including cell annotation and clustering, and exhibits superior generalization in out-of-distribution scenarios.

Tasks

Language Modeling Language Modelling Representation Learning

Language-Enhanced Representation Learning for Single-Cell Transcriptomics

Code

Abstract

Tasks

Reproductions