EN
登录

蛋白质与DNA结合特异性的几何深度学习

Geometric deep learning of protein–DNA binding specificity

Nature 等信源发布 2024-08-05 20:24

可切换为仅中文


AbstractPredicting protein–DNA binding specificity is a challenging yet essential task for understanding gene regulation. Protein–DNA complexes usually exhibit binding to a selected DNA target site, whereas a protein binds, with varying degrees of binding specificity, to a wide range of DNA sequences.

摘要预测蛋白质-DNA结合特异性对于理解基因调控是一项具有挑战性但必不可少的任务。蛋白质-DNA复合物通常表现出与选定DNA靶位点的结合,而蛋白质以不同程度的结合特异性与多种DNA序列结合。

This information is not directly accessible in a single structure. Here, to access this information, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity from protein–DNA structure. DeepPBS can be applied to experimental or predicted structures.

这些信息不能在单一结构中直接访问。在这里,为了获得这些信息,我们提出了结合特异性的深度预测因子(DeepPBS),这是一种几何深度学习模型,旨在从蛋白质-DNA结构预测结合特异性。DeepPBS可以应用于实验或预测结构。

Interpretable protein heavy atom importance scores for interface residues can be extracted. When aggregated at the protein residue level, these scores are validated through mutagenesis experiments. Applied to designed proteins targeting specific DNA sequences, DeepPBS was demonstrated to predict experimentally measured binding specificity.

可以提取界面残基的可解释蛋白质重原子重要性得分。当在蛋白质残基水平聚集时,这些分数通过诱变实验得到验证。应用于靶向特定DNA序列的设计蛋白质,DeepPBS被证明可以预测实验测量的结合特异性。

DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology..

DeepPBS为机器辅助研究提供了基础,这些研究可以增进我们对分子相互作用的理解,并指导实验设计和合成生物学。。

MainTranscription factors play critical roles in various regulatory functions that are essential to all aspects of life1. Therefore, understanding the mechanisms by which proteins target specific DNA sequences is crucial2. Extensive research has uncovered myriad binding mechanisms that lead to specific high-affinity binding, including strong electrostatic interaction of arginine residues in the DNA minor groove3, deoxyribose sugar-phenylalanine stacking4, bidentate hydrogen bonds (H-bonds) between guanine (G) and arginine (Arg) in the major groove5, and other interactions6,7,8.Protein–DNA structures are typically9 obtained through X-ray crystallography, nuclear magnetic resonance spectroscopy or cryo-electron microscopy experiments and stored in the Protein Data Bank (PDB)10.

主要转录因子在各种调节功能中起着至关重要的作用,这些功能对生命的各个方面都至关重要1。因此,了解蛋白质靶向特定DNA序列的机制至关重要。广泛的研究揭示了导致特异性高亲和力结合的无数结合机制,包括DNA小沟3中精氨酸残基的强静电相互作用,脱氧核糖糖苯丙氨酸堆积4,鸟嘌呤(G)和精氨酸(Arg)之间的双齿氢键(H键)在主沟5中,以及其他相互作用6,7,8。蛋白质-DNA结构通常是通过X射线晶体学,核磁共振波谱或冷冻电子显微镜实验获得的9,并存储在蛋白质数据库(PDB)10中。

Generally, these structures display one bound DNA sequence and the associated physicochemical interactions6 but do not encompass the full range of potentially bound DNA sequences. Conversely, this information can be experimentally obtained through protein-binding microarray11, systematic evolution of ligands by exponential enrichment combined with high-throughput sequencing (SELEX–seq)12, chromatin immunoprecipitation followed by sequencing13, high-throughput SELEX14 or related high-throughput approaches15.

通常,这些结构显示一个结合的DNA序列和相关的物理化学相互作用6,但不包括潜在结合的DNA序列的全部范围。相反,这些信息可以通过蛋白质结合微阵列11,通过指数富集结合高通量测序(SELEX-seq)12的配体系统进化,染色质免疫沉淀,然后测序13,高通量SELEX14或相关的高通量方法15通过实验获得。

These experiments capture the range of possible bound DNA sequences but do not necessarily provide structural information. In essence, these sets of experiments are complementary, and manual examination is often required to correlate molecular interaction details from structural data with binding specificity data6.Predicting binding specificity for a given protein sequence, across protein families, remains a challenging and unsolved problem, despite progress for specific protein families16,17,18.

这些实验捕获了可能结合的DNA序列的范围,但不一定提供结构信息。本质上,这些实验集是互补的,通常需要手动检查才能将结构数据中的分子相互作用细节与结合特异性数据相关联6。预测跨蛋白质家族的给定蛋白质序列的结合特异性仍然是一个具有挑战性且未解决的问题,尽管特定蛋白质家族取得了进展16,17,18。

Data availability

数据可用性

Datasets used for all analysis and associated custom scripts were deposited via figshare at https://doi.org/10.6084/m9.figshare.25678053 (ref. 64). Accession codes for discussed structures from the PDB: 1L3L, 7CLI, 2R5Z, 1CIT, 1F4K, 1GJI, 1TC3, 2BSQ, 2C9L, 5ZGN, 1BBX, 1KLN, 1N5Y, 5YUZ, 1QAI, 1XC8, 6T8H, 4TUI, 1DH3, 7OH9 and 1APL.

用于所有分析和相关自定义脚本的数据集通过figshare保存在https://doi.org/10.6084/m9.figshare.25678053(参考文献64)。来自PDB的讨论结构的登录号:1L3L,7CLI,2R5Z,1CIT,1F4K,1GJI,1TC3,2BSQ,2C9L,5ZGN,1BBX,1KLN,1N5Y,5YUZ,1QAI,1XC8,6T8H,4TUI,1DH3,7OH9和1APL。

UniProt accession codes for protein sequences discussed (folded with RFNA): Q8IUE0, Q6H878, O43680 and Q4H376. Accession codes for discussed experimental specificity data from JASPAR2022 and HOCOMOCOv11: MA1897.1, MA1568.1, MA1031.1, MA1572.1, MA0112.2, MA0112.3, ESR1_HUMAN.H11MO.0 and NFKB2_HUMAN.H11MO.0.B.

讨论的蛋白质序列的UniProt登录号(用RFNA折叠):Q8IUE0,Q6H878,O43680和Q4H376。来自JASPAR2022和HOCOMOCOv11的讨论实验特异性数据的登录号:MA1897.1,MA1568.1,MA1031.1,MA1572.1,MA0112.2,MA0112.3,ESR1\u HUMAN。H11MO.0和NFKB2\u人类。H11MO.0.B。

Mutagenesis experiment data used are available from the SAMPDI website (http://compbio.clemson.edu/media/download/SAMPDI_dataset.xlsx). MELD-DNA modeled complex data were taken from Zenodo at https://doi.org/10.5281/zenodo.7501937 (ref. 65). Source data are provided with this paper..

使用的诱变实验数据可从SAMPDI网站获得(http://compbio.clemson.edu/media/download/SAMPDI_dataset.xlsx)。MELD-DNA建模的复杂数据取自Zenodohttps://doi.org/10.5281/zenodo.7501937(参考文献65)。本文提供了源数据。。

Code availability

代码可用性

Installable source code, pretrained models, associated guidelines and various custom scripts can be found via GitHub at https://github.com/timkartar/DeepPBS. The implementation is also available via a Code Ocean capsule at https://doi.org/10.24433/CO.0545023.v2. In addition, DeepPBS is accessible as a webserver through https://deeppbs.usc.edu..

可通过GitHub找到可安装的源代码、预训练模型、相关指南和各种自定义脚本,网址为https://github.com/timkartar/DeepPBS.该实现也可以通过Code Ocean capsule在https://doi.org/10.24433/CO.0545023.v2.此外,DeepPBS作为Web服务器可以通过https://deeppbs.usc.edu..

ReferencesSpitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).Article

。Genet自然Rev。13613-626(2012)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Zhao, Y., Granas, D. & Stormo, G. D. Inferring binding energies from selected binding sites. PLoS Comput. Biol. 5, e1000590 (2009).Article

Zhao,Y.,Granas,D。和Stormo,G.D。从选定的结合位点推断结合能。PLoS计算机。生物学5,e1000590(2009)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Rohs, R. et al. The role of DNA shape in protein–DNA recognition. Nature 461, 1248–1253 (2009).Article

Rohs,R.等人。DNA形状在蛋白质-DNA识别中的作用。Nature 4611248–1253(2009)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Stirnimann, C. U., Ptchelkine, D., Grimm, C. & Müller, C. W. Structural basis of TBX5–DNA recognition: the T-box domain in its DNA-bound and -unbound form. J. Mol. Biol. 400, 71–81 (2010).Article

Stirnimann,C.U.,Ptchelkine,D.,Grimm,C.&Müller,C.W。TBX5-DNA识别的结构基础:DNA结合和未结合形式的T-box结构域。J.Mol.Biol。400,71-81(2010)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Helene, C. Specific recognition of guanine bases in protein–nucleic acid complexes. FEBS Lett. 74, 10–13 (1977).Article

。FEBS Lett公司。74,10-13(1977)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Rohs, R. et al. Origins of specificity in protein–DNA recognition. Annu. Rev. Biochem. 79, 233–269 (2010).Article

Rohs,R.等人。蛋白质-DNA识别特异性的起源。年。生物化学评论。79233-269(2010)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Schildbach, J. F., Karzai, A. W., Raumann, B. E. & Sauer, R. T. Origins of DNA-binding specificity: role of protein contacts with the DNA backbone. Proc. Natl Acad. Sci. USA 96, 811–817 (1999).Article

Schildbach,J.F.,Karzai,A.W.,Raumann,B.E.&Sauer,R.T。DNA结合特异性的起源:蛋白质与DNA骨架接触的作用。程序。。科学。美国96811-817(1999)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Seeman, N. C., Rosenberg, J. M. & Rich, A. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl Acad. Sci. USA 73, 804–808 (1976).Article

Seeman,N.C.,Rosenberg,J.M。&Rich,A。蛋白质对双螺旋核酸的序列特异性识别。程序。。科学。美国73804-808(1976)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Garvie, C. W. & Wolberger, C. Recognition of specific DNA sequences. Mol. Cell 8, 937–946 (2001).Article

Garvie,C.W。和Wolberger,C。识别特定的DNA序列。摩尔细胞8937-946(2001)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).Article

Berman,H.M.等人,《蛋白质数据库》。核酸研究28235-242(2000)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4, 393–411 (2009).Article

Berger,M.F。&Bulyk,M.L。通用蛋白质结合微阵列,用于全面表征转录因子的DNA结合特异性。自然协议。4393-411(2009)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011).Article

Slattery,M。等人。辅因子结合引起Hox蛋白之间DNA结合特异性的潜在差异。细胞1471270-1282(2011)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Park, P. J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009).Article

Park,P.J。ChIP-seq:成熟技术的优势和挑战。Genet自然Rev。10669-680(2009)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).Article

Jolma,A。等人。人类转录因子的DNA结合特异性。。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014).Article

Slattery,M。等人。缺乏一个简单的代码:转录因子如何读取基因组。趋势生物化学。科学。39381-399(2014)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Persikov, A. V. & Singh, M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 42, 97–108 (2014).Article

Persikov,A.V。&Singh,M。从头预测Cys2His2锌指蛋白的DNA结合特异性。核酸研究42,97-108(2014)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Wetzel, J. L., Zhang, K. & Singh, M. Learning probabilistic protein–DNA recognition codes from DNA-binding specificities using structural mappings. Genome Res. 32, 1776–1786 (2022).Article

Wetzel,J.L.,Zhang,K。&Singh,M。使用结构映射从DNA结合特异性学习概率蛋白质-DNA识别代码。基因组研究321776-1786(2022)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Persikov, A. V., Osada, R. & Singh, M. Predicting DNA recognition by Cys2His2 zinc finger proteins. Bioinformatics 25, 22–29 (2009).Article

Persikov,A.V.,Osada,R。&Singh,M。预测Cys2His2锌指蛋白对DNA的识别。生物信息学25,22-29(2009)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Aizenshtein-Gazit, S. & Orenstein, Y. DeepZF: improved DNA-binding prediction of C2H2-zinc-finger proteins by deep transfer learning. Bioinformatics 38, ii62–ii67 (2022).Article

Aizenshtein-Gazit,S。和Orenstein,Y。DeepZF:通过深度转移学习改进了C2H2锌指蛋白的DNA结合预测。生物信息学38,ii62–ii67(2022)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Meseguer, A. et al. On the prediction of DNA-binding preferences of C2H2-ZF domains using structural models: application on human CTCF. NAR Genom. Bioinform. 2, lqaa046 (2020).Article

Meseguer,A。等人。使用结构模型预测C2H2-ZF结构域的DNA结合偏好:在人类CTCF上的应用。NAR Genom。生物信息。2,lqaa046(2020)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Molparia, B., Goyal, K., Sarkar, A., Kumar, S. & Sundar, D. ZiF-Predict: a web tool for predicting DNA-binding specificity in C2H2 zinc finger proteins. Genom. Proteom. Bioinform. 8, 122–126 (2010).Article

Molparia,B.,Goyal,K.,Sarkar,A.,Kumar,S。&Sundar,D。ZiF预测:用于预测C2H2锌指蛋白中DNA结合特异性的网络工具。基因组。蛋白质组学。生物信息。8122-126(2010)。文章

CAS

中科院

Google Scholar

谷歌学者

Christensen, R. G. et al. Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics 28, i84–i89 (2012).Article

Christensen,R.G.等人。预测同源域蛋白DNA结合特异性的识别模型。生物信息学28,i84-i89(2012)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Yanover, C. & Bradley, P. Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers. Nucleic Acids Res. 39, 4564–4576 (2011).Article

Yanover,C。&Bradley,P。广泛的蛋白质和DNA骨架采样改善了C2H2锌指的基于结构的特异性预测。核酸研究394564-4576(2011)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Chiu, T. P., Rao, S. & Rohs, R. Physicochemical models of protein–DNA binding with standard and modified base pairs. Proc. Natl Acad. Sci. USA 120, e2205796120 (2023).Article

Chiu,T.P.,Rao,S。&Rohs,R。蛋白质-DNA与标准和修饰碱基对结合的物理化学模型。程序。。科学。美国120,e2205796120(2023)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Stormo, G. D. Modeling the specificity of protein–DNA interactions. Quant. Biol. 1, 115–130 (2013).Article

Stormo,G.D。模拟蛋白质与DNA相互作用的特异性。数量。。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).Article

Jumper,J.等人。使用AlphaFold进行高度准确的蛋白质结构预测。自然596583-589(2021)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods https://doi.org/10.1038/s41592-024-02272-z (2024).Article

Ahdritz,G。等人。OpenFold:对AlphaFold2进行再训练,可以对其学习机制和泛化能力产生新的见解。自然方法https://doi.org/10.1038/s41592-024-02272-z(2024年)。文章

PubMed

PubMed

Google Scholar

谷歌学者

Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).Article

Baek,M.等人,《使用三轨神经网络精确预测蛋白质结构和相互作用》,《科学》373871–876(2021)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Baek, M., Mchugh, R., Anishchenko, I., Baker, D. & Dimaio, F. Accurate prediction of nucleic acid and protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).Article

Baek,M.,Mchugh,R.,Anishchenko,I.,Baker,D。&Dimaio,F。使用RoseTTAFoldNA准确预测核酸和蛋白质-核酸复合物。自然方法21117-121(2024)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).Article

Krishna,R.等人。RoseTTAFold全原子的广义生物分子建模和设计。科学384,eadl2528(2024)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Esmaeeli, R., Bauzá, A. & Perez, A. Structural predictions of protein–DNA binding: MELD-DNA. Nucleic Acids Res. 51, 1625–1636 (2023).Article

Esmaeeli,R.,Bauzá,A。&Perez,A。蛋白质-DNA结合的结构预测:MELD-DNA。核酸研究511625-1636(2023)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).Article

Abramson,J.等人,《生物分子与α折叠3相互作用的精确结构预测》。自然630493-500(2024)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Morrison, K. L. & Weiss, G. A. Combinatorial alanine-scanning. Curr. Opin. Chem. Biol. 5, 302–307 (2001).Article

Morrison,K.L。和Weiss,G.A。组合丙氨酸扫描。货币。奥平。化学。生物学5302-307(2001)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Glasscock, C. J. et al. Computational design of sequence-specific DNA-binding proteins. Preprint at bioRxiv https://doi.org/10.1101/2023.09.20.558720 (2023).Joshi, R. et al. Functional specificity of a hox protein mediated by the recognition of minor groove structure. Cell 131, 530–543 (2007).Article .

Glasscock,C.J。等人。序列特异性DNA结合蛋白的计算设计。bioRxiv预印本https://doi.org/10.1101/2023.09.20.558720(2023年)。Joshi,R。等人。通过识别小沟结构介导的hox蛋白的功能特异性。细胞131530-543(2007)。文章。

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022).Article

Castro Mondragon,J.A.等人,《JASPAR 2022:转录因子结合谱开放获取数据库的第9版》。核酸研究50,D165-D173(2022)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).Article

Kulakovskiy,I.V.等人,HOCOMOCO:通过大规模ChIP-seq分析,建立完整的人和小鼠转录因子结合模型。核酸研究46,D252-D259(2018)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Agback, P., Baumann, H., Knapp, S., Ladenstein, R. & Härd, T. Architecture of nonspecific protein–DNA interactions in the Sso7d–DNA complex. Nat. Struct. Biol. 5, 579–584 (1998).Article

Agback,P.,Baumann,H.,Knapp,S.,Ladenstein,R。&Härd,T。Sso7d-DNA复合物中非特异性蛋白质-DNA相互作用的结构。自然结构。生物学5579-584(1998)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).Article

Mistry,J.等人,《Pfam:2021年的蛋白质家族数据库》。核酸研究49,D412-D419(2021)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Persikov, A. V. & Singh, M. An expanded binding model for Cys2 His2 zinc finger protein–DNA interfaces. Phys. Biol. 8, 035010 (2011).Article

Persikov,A.V。&Singh,M。Cys2 His2锌指蛋白-DNA界面的扩展结合模型。物理。生物学8035010(2011)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Ichikawa, D. M. et al. A universal deep-learning model for zinc finger design enables transcription factor reprogramming. Nat. Biotechnol. 41, 1117–1129 (2023).Article

Ichikawa,D.M。等人。锌指设计的通用深度学习模型可以实现转录因子重编程。美国国家生物技术公司。411117-1129(2023)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Escalante, C. R., Yie, J., Thanos, D. & Aggarwal, A. K. Structure of IRF-1 with bound DNA reveals determinants of interferon regulation. Nature 391, 103–106 (1998).Article

Escalante,C.R.,Yie,J.,Thanos,D。&Aggarwal,A.K。具有结合DNA的IRF-1的结构揭示了干扰素调节的决定因素。自然391103-106(1998)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

de Martin, X., Sodaei, R. & Santpere, G. Mechanisms of binding specificity among bHLH transcription factors. Int. J. Mol. Sci. 22, 9150 (2021).Article

de Martin,X.,Sodaei,R。&Santpere,G。bHLH转录因子之间结合特异性的机制。Int.J.Mol.Sci。229150(2021)。文章

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722–2728 (2013).Article

Mariani,V.,Biasini,M.,Barbato,A。&Schwede,T。lDDT:使用距离差检验比较蛋白质结构和模型的无局部叠加分数。生物信息学292722-2728(2013)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Genheden, S. & Ryde, U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin. Drug Discov. 10, 449–461 (2015).Article

。专家意见。药物发现。10449-461(2015)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Joerger, A. C. & Fersht, A. R. Structural biology of the tumor suppressor p53. Annu. Rev. Biochem. 77, 557–582 (2008).Article

Joerger,A.C。和Fersht,A.R。肿瘤抑制因子p53的结构生物学。年。生物化学评论。77557-582(2008)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Kitayner, M. et al. Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat. Struct. Mol. Biol. 17, 423–429 (2010).Article

Kitayner,M.等人。通过具有Hoogsteen碱基对的晶体结构揭示p53对DNA识别的多样性。自然结构。分子生物学。17423-429(2010)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Petty, T. J. et al. An induced fit mechanism regulates p53 DNA binding kinetics to confer sequence specificity. EMBO J. 30, 2167–2176 (2011).Article

Petty,T.J。等人。诱导拟合机制调节p53 DNA结合动力学以赋予序列特异性。EMBO J.302167–2176(2011)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Kitayner, M. et al. Structural basis of DNA recognition by p53 tetramers. Mol. Cell 22, 741–753 (2006).Article

Kitayner,M.等人,《p53四聚体识别DNA的结构基础》。分子细胞22741-753(2006)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Reaz, S., Mossalam, M., Okal, A. & Lim, C. S. A single mutant, A276S of p53, turns the switch to apoptosis. Mol. Pharm. 10, 1350–1359 (2013).Article

Reaz,S.,Mossalam,M.,Okal,A。&Lim,C.S。p53的单个突变体A276S将开关转向细胞凋亡。摩尔药理学101350-1359(2013)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Barakat, K., Issack, B. B., Stepanova, M. & Tuszynski, J. Effects of temperature on the p53–DNA binding interactions and their dynamical behavior: comparing the wild type to the R248Q mutant. PLoS ONE 6, e27651 (2011).Article

。《公共科学图书馆·综合》第6期,第27651页(2011年)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Vousden, K. H. & Prives, C. Blinded by the light: the growing complexity of p53. Cell 137, 413–431 (2009).Article

Vousden,K.H。&Prives,C。被光线蒙蔽:p53的复杂性日益增加。细胞137413-431(2009)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Peterson, S. N., Dahlquist, F. W. & Reich, N. O. The role of high affinity non-specific DNA binding by Lrp in transcriptional regulation and DNA organization. J. Mol. Biol. 369, 1307–1317 (2007).Article

Peterson,S.N.,Dahlquist,F.W。&Reich,N.O。Lrp高亲和力非特异性DNA结合在转录调控和DNA组织中的作用。J、 。3691307-1317(2007)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Ovek, D. et al. Artificial intelligence based methods for hot spot prediction. Curr. Opin. Struct. Biol. 72, 209–218 (2022).Article

Ovek,D.等人。基于人工智能的热点预测方法。货币。奥平。。生物学72209-218(2022)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Stefl, R., Wu, H., Ravindranathan, S., Sklenář, V. & Feigon, J. DNA A-tract bending in three dimensions: solving the dA4T4 vs. dT4A4 conundrum. Proc. Natl Acad. Sci. USA 101, 1177–1182 (2004).Article

Stefl,R.,Wu,H.,Ravindranathan,S.,Sklenář,V。&Feigon,J。三维DNA A道弯曲:解决dA4T4与dT4A4难题。程序。。科学。美国1011177-1182(2004)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Li, J., Chiu, T. P. & Rohs, R. Predicting DNA structure using a deep learning method. Nat. Commun. 15, 1243 (2024).Article

Li,J.,Chiu,T.P。&Rohs,R。使用深度学习方法预测DNA结构。国家公社。151243(2024)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Dror, I., Zhou, T., Mandel-Gutfreund, Y. & Rohs, R. Covariation between homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res. 42, 430–441 (2014).Article

Dror,I.,Zhou,T.,Mandel-Gutfreund,Y。&Rohs,R。同源域转录因子与其DNA结合位点形状之间的协变异。核酸研究42430-441(2014)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Noyes, M. B. et al. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289 (2008).Article

Noyes,M.B。等人。同源域特异性的分析允许家族范围内预测首选识别位点。细胞1331277-1289(2008)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Persikov, A. V. et al. A systematic survey of the Cys2His2 zinc finger DNA-binding landscape. Nucleic Acids Res. 43, 1965–1984 (2015).Article

Persikov,A.V.等人对Cys2His2锌指DNA结合景观的系统调查。核酸研究431965-1984(2015)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Otwinowski, Z. et al. Crystal structure of trp represser/operator complex at atomic resolution. Nature 335, 321–329 (1988).Article

Otwinowski,Z。等人。trp阻遏物/算符复合物在原子分辨率下的晶体结构。自然335321-329(1988)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Zhou, T. et al. Quantitative modeling of transcription factor binding specificities using DNA shape. Proc. Natl Acad. Sci. USA 112, 4654–4659 (2015).Article

Zhou,T.等人。使用DNA shape对转录因子结合特异性进行定量建模。程序。。科学。美国1124654–4659(2015)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Nair, S. K. & Burley, S. K. X-ray structures of Myc-Max and Mad-Max recognizing DNA. Cell 112, 193–205 (2003).Article

Nair,S.K。&Burley,S.K。Myc-Max和Mad-Max识别DNA的X射线结构。细胞112193-205(2003)。文章

CAS

中科院

PubMed

PubMed

Google Scholar

谷歌学者

Afek, A. et al. DNA mismatches reveal conformational penalties in protein–DNA recognition. Nature 587, 291–296 (2020).Article

Afek,A。等人。DNA错配揭示了蛋白质-DNA识别中的构象惩罚。自然587291-296(2020)。文章

CAS

中科院

PubMed

PubMed

PubMed Central

公共医学中心

Google Scholar

谷歌学者

Mitra, R. DeepPBS data. figshare https://doi.org/10.6084/m9.figshare.25678053.v1 (2024).GoldEagle93. PDNALab/MELD-DNA: release for Zenodo. Zenodo https://doi.org/10.5281/zenodo.7501938 (2023).Download referencesAcknowledgementsThis work was supported by an Andrew J. Viterbi Fellowship in Computational Biology and Bioinformatics (to R.M.), a Washington Research Foundation postdoctoral fellowship (to C.J.G.), the Human Frontier Science Program (grant RGP0021/2018 to R.R.) and the National Institutes of Health (grant R35GM130376 to R.R.).

Mitra,R。DeepPBS数据。figshare公司https://doi.org/10.6084/m9.figshare.25678053.v1(2024年)。GoldEagle93。PDNALab/MELD-DNA:Zenodo发布。泽诺多https://doi.org/10.5281/zenodo.7501938(2023年)。下载参考文献致谢这项工作得到了安德鲁·维特比计算生物学和生物信息学奖学金(授予R.M.),华盛顿研究基金会博士后奖学金(授予C.J.G.),人类前沿科学计划(授予R.R.RGP0021/2018)和国立卫生研究院(授予R.R.R35GM130376)的支持。

We acknowledge L. Manna for setup and maintenance of the DeepPBS webserver and thank all Rohs lab members for support and valuable feedback.Author informationAuthor notesJared M. SagendorfPresent address: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USAAuthors and AffiliationsDepartment of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USARaktim Mitra, Jinsen Li, Jared M.

我们感谢L.Manna对DeepPBS Web服务器的设置和维护,并感谢所有Rohs实验室成员的支持和宝贵反馈。作者信息作者注释Ared M.SagendorfPresent地址:美国加利福尼亚大学旧金山分校生物工程与治疗科学系作者和附属机构南加州大学洛杉矶分校定量与计算生物学系,USARaktim Mitra,Jinsen Li,Jared M。

Sagendorf, Yibei Jiang, Ari S. Cohen, Tsu-Pei Chiu & Remo RohsDepartment of Biochemistry, University of Washington, Seattle, WA, USACameron J. GlasscockInstitute for Protein Design, University of Washington, Seattle, WA, USACameron J. GlasscockDepartment of Chemistry, University of Southern California, Los Angeles, CA, USARemo RohsDepartment of Physics and Astronomy, University of Southern California, Los Angeles, CA, USARemo RohsThomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USARemo RohsAuthorsRaktim MitraView author publicationsYou can also search for this author in.

Sagendorf,Yibei Jiang,Ari S.Cohen,Tsu Pei Chiu&Remo RohsDepartment of Biochemistry,华盛顿大学西雅图分校,华盛顿大学,华盛顿州西雅图分校,美国卡梅伦J.GlasscockInstitute for Protein Design,华盛顿大学西雅图分校,华盛顿州,美国卡梅伦J.GlasscockDepartment of Chemistry,南加州大学洛杉矶分校,美国南加州大学物理与天文学系,加利福尼亚州洛杉矶分校,USARemo RohsThomas Lord Department of Computer Science,南加州大学洛杉矶分校,加利福尼亚州,USARemo RohsAuthorsRaktim MitraView author Publications您也可以在中搜索这位作者。

PubMed Google ScholarJinsen LiView author publicationsYou can also search for this author in

PubMed Google ScholarJinsen LiView作者出版物您也可以在

PubMed Google ScholarJared M. SagendorfView author publicationsYou can also search for this author in

PubMed Google ScholarJared M.SagendorfView作者出版物您也可以在

PubMed Google ScholarYibei JiangView author publicationsYou can also search for this author in

PubMed Google ScholarYibei JiangView作者出版物您也可以在

PubMed Google ScholarAri S. CohenView author publicationsYou can also search for this author in

PubMed Google ScholarAri S.CohenView作者出版物您也可以在

PubMed Google ScholarTsu-Pei ChiuView author publicationsYou can also search for this author in

PubMed Google ScholarTsu Pei ChiuView作者出版物您也可以在

PubMed Google ScholarCameron J. GlasscockView author publicationsYou can also search for this author in

PubMed Google ScholarCameron J.GlasscockView作者出版物您也可以在

PubMed Google ScholarRemo RohsView author publicationsYou can also search for this author in

PubMed Google ScholarRemo RohsView作者出版物您也可以在

PubMed Google ScholarContributionsR.M., J.M.S. and R.R. conceived the project idea with input from T.P.C. R.M., J.M.S. and J.L. designed the model. R.M. and J.M.S. performed data preprocessing. R.M., with input from J.L. and J.M.S., performed model training and benchmarking. R.M., J.L.

PubMed谷歌学术贡献。M、 。R、 M.和J.M.S.进行了数据预处理。R、 M.在J.L.和J.M.S.的投入下,进行了模型培训和基准测试。R、 M.,J.L。

and T.P.C. developed all application ideas. R.M., J.L. and Y.J. carried out all applications and data analysis. A.S.C., with input from R.M., designed and built the web-based implementation. C.J.G. provided data for validation and application on predicted and synthetic designs. R.M., J.L., Y.J. and R.R.

T.P.C.开发了所有的应用想法。R、 M.,J.L.和Y.J.进行了所有的应用和数据分析。A、 S.C.根据R.M.的输入,设计并构建了基于web的实现。C、 J.G.为预测和合成设计的验证和应用提供了数据。R、 M.,J.L.,Y.J.和R.R。

wrote the paper. All authors read and commented on the paper. R.R. supervised the project.Corresponding authorCorrespondence to.

写了这篇论文。所有作者都阅读并评论了这篇论文。R、 。对应作者对应。

Remo Rohs.Ethics declarations

Remo Rohs。道德宣言

Competing interests

相互竞争的利益

The authors declare no competing interests.

作者声明没有利益冲突。

Peer review

同行评审

Peer review information

同行评审信息

Nature Methods thanks Gregory Poon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Nature Methods感谢Gregory Poon和另一位匿名审稿人对这项工作的同行评审所做的贡献。同行评审报告可用。主要处理编辑:Arunima Singh,与Nature Methods团队合作。

Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary informationSupplementary InformationSupplementary Figs. 1–12, discussion and Table 1.Reporting SummaryPeer Review FileSupplementary Data 1Cluster-wise description of cross-validation folds, two-sided t-test results between DeepPBS variations and two-sided t-test results between readout modes.Supplementary Data 2Source data for supplementary figures.Supplementary Video 1Concurrent view of changes in network prediction as MD simulation of Exd-Scr–DNA complex progressed, along with corresponding changes in heavy atom importance score.Source dataSource Data Fig.

Additional informationPublisher的注释Springer Nature在已发布的地图和机构隶属关系中的管辖权主张方面保持中立。补充信息补充信息补充图1-12,讨论和表1。报告摘要同行评审文件补充数据1交叉验证折叠的聚类描述,DeepPBS变化之间的双侧t检验结果和读出模式之间的双侧t检验结果。补充数据2补充数字的来源数据。补充视频1随着Exd-Scr-DNA复合物的MD模拟的进展,网络预测变化的当前视图,以及重原子重要性得分的相应变化。源数据源数据图。

1Statistical source data.Source Data Fig. 2Statistical source data.Source Data Fig. 3Statistical source data.Source Data Fig. 4Statistical source data.Source Data Fig. 5Statistical source data.Rights and permissions.

1统计源数据。源数据图2统计源数据。源数据图3统计源数据。源数据图4统计源数据。源数据图5统计源数据。权限和权限。

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

开放获取本文是根据知识共享署名4.0国际许可证授权的,该许可证允许以任何媒体或格式使用,共享,改编,分发和复制,只要您对原始作者和来源给予适当的信任,提供知识共享许可证的链接,并指出是否进行了更改。

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

本文中的图像或其他第三方材料包含在文章的知识共享许可中,除非在材料的信用额度中另有说明。如果材料未包含在文章的知识共享许可中,并且您的预期用途不受法律法规的许可或超出许可用途,则您需要直接获得版权所有者的许可。

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/..

要查看此许可证的副本,请访问http://creativecommons.org/licenses/by/4.0/..

Reprints and permissionsAbout this articleCite this articleMitra, R., Li, J., Sagendorf, J.M. et al. Geometric deep learning of protein–DNA binding specificity.

转载和许可本文引用本文Mitra,R.,Li,J.,Sagendorf,J.M。等人。蛋白质-DNA结合特异性的几何深度学习。

Nat Methods (2024). https://doi.org/10.1038/s41592-024-02372-wDownload citationReceived: 13 August 2023Accepted: 14 June 2024Published: 05 August 2024DOI: https://doi.org/10.1038/s41592-024-02372-wShare this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard.

Nat方法(2024)。https://doi.org/10.1038/s41592-024-02372-wDownload引文接收日期:2023年8月13日接收日期:2024年6月14日发布日期:2024年8月5日OI:https://doi.org/10.1038/s41592-024-02372-wShare本文与您共享以下链接的任何人都可以阅读此内容:获取可共享链接对不起,本文目前没有可共享的链接。复制到剪贴板。

Provided by the Springer Nature SharedIt content-sharing initiative

由Springer Nature SharedIt内容共享计划提供