EN
登录

由CZI和哥伦比亚大学研究人员开发的AI虚拟细胞模型瞄准了细胞行为的复杂性

AI Virtual Cell Model Developed by CZI, Columbia Researchers Takes Aim at Complexity of Cell Behavior

GenomeWeb 等信源发布 2025-07-10 10:25

可切换为仅中文


NEW YORK – Researchers from Columbia University and the Chan Zuckerberg Initiative have developed a new artificial intelligence-based virtual human cell model, which they claim offers improved predictions about how cells transition from one state to another in health and disease.

纽约——哥伦比亚大学和陈·扎克伯格倡议的研究人员开发了一种新的人工智能虚拟人类细胞模型,他们声称该模型能够更好地预测细胞在健康和疾病状态下如何从一种状态转变为另一种状态。

Gene Regulatory Embedding-based Large Neural model (GREmLN) has been trained on millions of single-cell gene expression data points to identify how genes in a cell work together and how changes to genes and gene expression can lead to disease.

基因调控嵌入式大型神经网络模型(GREmLN)已经在数百万个单细胞基因表达数据点上进行了训练,以识别细胞中的基因如何协同工作,以及基因和基因表达的变化如何导致疾病。

'This model doesn't try to reshape biology to fit AI,' said Andrea Califano, president of the CZ Biohub New York and a senior author of a preprint describing GREmLN. 'It reshapes AI to fit biology.'

“这个模型并不是试图重塑生物学以适应人工智能,”CZ Biohub纽约总裁、描述GREmLN的预印本资深作者安德烈亚·卡利法诺说,“而是重塑人工智能以适应生物学。”

In the

preprint

预印本

, they describe the graph-based architecture of the model, offer some validation data, and compare it to leading virtual cell models that have also been trained on single-cell gene expression data, including scGPT, Geneformer, and scFoundation. GREmLN was able to reconstruct gene expression profiles for both healthy and diseased human cells and was able to classify nine types of cells —including immune cells — based on their expression profiles.

,他们描述了该模型基于图的架构,提供了一些验证数据,并将其与领先的虚拟细胞模型进行了比较,这些模型同样是在单细胞基因表达数据上训练的,包括 scGPT、Geneformer 和 scFoundation。GREmLN 能够重建健康和患病人类细胞的基因表达谱,并能够根据它们的表达谱对九种类型的细胞(包括免疫细胞)进行分类。

.

'Understanding cellular behavior means understanding the network of conversations happening inside every cell,' Theofanis Karaletsos, senior director of AI at CZI and a senior author of the preprint, said in a statement. 'GREmLN captures that complexity in a way we've never been able to before. It's a step toward building systems that help us simulate and predict the behavior of cells.'.

“理解细胞行为意味着理解每个细胞内部发生的网络对话,”CZI人工智能高级主管、该预印本的资深作者西奥法尼斯·卡拉莱托斯在一份声明中说道。“GREmLN以我们以前从未能做到的方式捕捉到了这种复杂性。这是朝着构建帮助我们模拟和预测细胞行为的系统迈出的一步。”

Where GREmLN differs most from its peers is under the hood.

GREmLN 与其同类产品最大的不同之处在于其内部机制。

Previously developed algorithms

以前开发的算法

, such as Geneformer and scGPT, are based on large language models, which operate on a sequential logic. Like ChatGPT, they try to predict what comes next based on what they've seen nearby. 'In biology, there's no such order,' Califano said. 'Gene number one may regulate gene number 20,000.'

,例如 Geneformer 和 scGPT,基于大型语言模型,这些模型按照顺序逻辑运行。就像 ChatGPT 一样,它们试图根据附近看到的内容预测接下来会发生什么。加州大学伯克利分校的卡利法诺说:“在生物学中,不存在这样的顺序。” “第一个基因可能调控第两万个基因。”

With GREmLN, the developers gave the algorithm a conceptual map of where to look for certain relationships in the form of gene regulatory networks based on the algorithm for the reconstruction of accurate cellular networks (

通过GREmLN,开发人员为算法提供了一张概念图,以便根据准确重建细胞网络的算法,寻找基因调控网络中的某些关系。

ARACNe

ARACNe

), another algorithm developed by Califano and introduced in 2006. In the same way that a targeted sequencing panel makes DNA sequencing more efficient, this reduces the search space and has allowed the researchers to build a tool that they suggest eclipses other AI models using just a fraction of the same training data and computing resources.

),这是 Califano 开发并于 2006 年推出的另一种算法。正如靶向测序面板使 DNA 测序更高效一样,这减少了搜索空间,并让研究人员能够构建一个工具,他们认为该工具仅使用了相同训练数据和计算资源的一小部分,就超越了其他 AI 模型。

.

That's not to say it was an easy path. Using network graphs to influence the algorithms' attention mechanism 'simplifies the solution but complicates the problem mathematically,' Califano said. 'What we had to do was come up with new math.'

这并不是说这条路很容易。使用网络图来影响算法的注意力机制“简化了解决方案,但在数学上使问题复杂化了,”卡拉菲奥说。“我们所要做的是想出新的数学方法。”

Like the models it compared itself against, GREmLN offers the ability to evaluate a cell's transcriptome and look for the genetic levers to push it into a different state. Califano gave the example of immune cells that have been programmed to go to a specific organ, say, the pancreas. Using GREmLN, one might be better able to predict what genes to influence to take a cell of choice and influence it so that it will go to the pancreas.

与它所比较的模型一样,GREmLN 提供了评估细胞转录组并寻找推动其进入不同状态的遗传杠杆的能力。卡利法诺以被编程为前往特定器官(比如胰腺)的免疫细胞为例进行了说明。使用 GREmLN,我们可能更能够预测哪些基因需要影响,从而选择特定细胞并对其进行影响,使其前往胰腺。

Figuring out how to turn an exhausted T cell back into an active one is another specific problem that the model could be helpful with. .

弄清楚如何将耗尽的T细胞重新激活是该模型可能有所帮助的另一个具体问题。

'We think we can find solutions to really biologically relevant problems a lot better than [other] AI,' Califano said, from blocking the ability of cancer cells to evade therapy to helping researchers predict how cells will react to new drugs.

“我们认为,我们可以比其他人工智能更好地找到解决真正具有生物学相关性问题的方案,”卡利法诺说,从阻止癌细胞逃避治疗的能力,到帮助研究人员预测细胞对新药物的反应。

Gene networks identified by ARACNe have already helped predict how patients will respond to drugs and find treatments for patients whose diseases were thought to be untreatable. 'It will be interesting to see whether their new model can generalize to unseen cell states without inferring a new network for each new dataset given the context-specificity of gene networks,' said Christina Theodoris, a researcher at the Gladstone Institutes and the developer of Geneformer, one of the other AI models that the GREmLN team benchmarked their algorithm against.

ARACNe识别出的基因网络已经帮助预测患者对药物的反应,并为那些被认为无法治愈的疾病找到治疗方法。“考虑到基因网络的上下文特异性,看看他们的新模型是否能够推广到未见过的细胞状态,而无需为每个新数据集推断新的网络,这将很有趣,”格莱斯顿研究所的研究员、Geneformer(GREmLN团队将其算法与其他AI模型进行基准测试的对象之一)的开发者克里斯蒂娜·塞奥多里斯说道。

.

'It will also be interesting to evaluate the model across a wider range of tasks and cell types to confirm its generalizability,' she said, noting that the preprint has only evaluated the model on two specific tasks so far and has not yet shown experimental validation of new predictions.

“评估该模型在更广泛的任务和细胞类型上的表现也将很有趣,以确认其泛化能力,”她说道,并指出预印本目前仅在两个特定任务上评估了该模型,尚未对新预测进行实验验证。

But this version of GREmLN is just the beginning. The team already has plans to train future versions with data generated solely for the purpose of training AI models from its

但是这个版本的GREmLN仅仅是个开始。该团队已经计划用专门为训练AI模型而生成的数据来训练未来的版本。

Billion Cells Project

百亿细胞计划

. Furthermore, GREmLN is part of the groundwork for CZI's ultimate plan to develop a more general AI-based virtual cell model. 'This is the basement level of a big skyscraper,' Califano said.

此外,GREmLN 是 CZI 最终开发更通用的基于人工智能的虚拟细胞模型计划的基础工作之一。卡利法诺说:“这是摩天大楼的地下室级别”。

Beyond the gene regulatory networks, he believes the model can be further improved by also considering other data, including signal transduction, protein-protein interactions, and microRNA binding, among others. 'Each layer will bring a new constraint on the attention of GREmLN and make it more efficient,' he said..

除了基因调控网络之外,他认为通过考虑其他数据,包括信号转导、蛋白质-蛋白质相互作用和微RNA结合等,可以进一步改进该模型。他说:“每一层都将为GREmLN的注意力带来新的约束,使其更加高效。”