商务合作
动脉网APP
可切换为仅中文
AbstractScaffolding is crucial for constructing most chromosome-level genomes. The high-throughput chromatin conformation capture (Hi-C) technology has become the primary scaffolding strategy due to its convenience and cost-effectiveness. As sequencing technologies and assembly algorithms advance, constructing haplotype-resolved genomes is increasingly preferred because haplotypes can provide additional genetic information on allelic and non-allelic variations.
摘要脚手架对于构建大多数染色体水平的基因组至关重要。高通量染色质构象捕获(Hi-C)技术由于其方便性和成本效益而成为主要的支架策略。随着测序技术和组装算法的进步,构建单倍型解析的基因组越来越受欢迎,因为单倍型可以提供有关等位基因和非等位基因变异的其他遗传信息。
ALLHiC is a widely used allele-aware scaffolding tool designed for this purpose. However, its dependence on chromosome-level reference genomes and a higher chromosome misassignment rate still impede the unravelling of haplotype-resolved genomes. Here we present HapHiC, a reference-independent allele-aware scaffolding tool with superior performance on chromosome assignment as well as contig ordering and orientation.
ALLHiC是为此目的设计的一种广泛使用的等位基因感知脚手架工具。然而,它对染色体水平参考基因组的依赖性和较高的染色体错配率仍然阻碍了单倍型解析基因组的解开。在这里,我们介绍了HapHiC,这是一种独立于参考的等位基因感知支架工具,在染色体分配以及重叠群排序和定向方面具有优异的性能。
In addition, we provide new insights into the challenges in allele-aware scaffolding by conducting comprehensive analyses on various adverse factors. Finally, with the help of HapHiC, we constructed the haplotype-resolved allotriploid genome for Miscanthus × giganteus, an important lignocellulosic bioenergy crop..
此外,我们通过对各种不利因素进行全面分析,为等位基因感知支架的挑战提供了新的见解。最后,在HapHiC的帮助下,我们为重要的木质纤维素生物能源作物芒草构建了单倍型解析的异源三倍体基因组。。
Access through your institution
通过您的机构访问
Buy or subscribe
购买或订阅
This is a preview of subscription content, access via your institution
这是订阅内容的预览,可通过您的机构访问
Access options
访问选项
Access through your institution
通过您的机构访问
Access through your institution
通过您的机构访问
Change institution
变革机构
Buy or subscribe
购买或订阅
Access Nature and 54 other Nature Portfolio journalsGet Nature+, our best-value online-access subscription24,99 € / 30 dayscancel any timeLearn moreSubscribe to this journalReceive 12 digital issues and online access to articles111,21 € per yearonly 9,27 € per issueLearn moreBuy this articlePurchase on Springer LinkInstant access to full article PDFBuy nowPrices may be subject to local taxes which are calculated during checkout.
Access Nature和54本其他Nature Portfolio journalsGet Nature+,我们最有价值的在线订阅24,99欧元/30天浏览所有时间获取更多订阅本期刊每年接收12期数字期刊和在线访问文章111,21欧元每期仅9,27欧元学习更多购买本文在Springer Links上购买即时访问完整文章PDFBuy Now价格可能需要缴纳结帐期间计算的当地税费。
Additional access options:
其他访问选项:
Log in
登录
Learn about institutional subscriptions
了解机构订阅
Read our FAQs
阅读我们的常见问题
Contact customer support
联系客户支持
Fig. 1: Overview of the HapHiC pipeline.Fig. 2: Comprehensive performance analysis of Hi-C-based scaffolding tools in chromosome assignment under various adverse conditions.Fig. 3: Evaluation of Hi-C-based scaffolding tools’ performance in contig ordering and orientation across assemblies with varying contig N50 values.Fig.
图1:HapHiC管道概述。图2:在各种不利条件下,基于Hi-C的脚手架工具在染色体分配中的综合性能分析。图3:基于Hi-C的脚手架工具在具有不同重叠群N50值的组件中的重叠群排序和定向性能的评估。图。
4: Comparative analysis of execution time and memory usage for Hi-C-based scaffolding tools.Fig. 5: Comparative analysis and examples of HapHiC in scaffolding published autotetraploid genomes.Fig. 6: Comparative genomic analysis of M. × giganteus and other Miscanthus species..
4: 基于Hi-C的脚手架工具的执行时间和内存使用的比较分析。图5:脚手架公布的同源四倍体基因组中HapHiC的比较分析和例子。图6:M.×giganteus和其他芒属物种的比较基因组分析。。
Data availability
数据可用性
All raw sequencing data and the final chromosome-level haplotype-resolved genome of M. × giganteus have been deposited in the National Genomics Data Center (NGDC, https://ngdc.cncb.ac.cn) under BioProject PRJCA021366. The genome assembly of M. sativa XinJiangDaYe used as the ground truth for validating HapHiC is available via Figshare at https://doi.org/10.6084/m9.figshare.26037289.v1 (ref.
所有原始测序数据和最终的染色体水平单倍型解析基因组均已保存在国家基因组学数据中心(NGDC,https://ngdc.cncb.ac.cn)在生物项目PRJCA021366下。作为验证HapHiC的基本事实的苜蓿新疆大叶的基因组组装可通过Figshare获得https://doi.org/10.6084/m9.figshare.26037289.v1(参考。
54). The potato C88 genome assembled in this work is available via Figshare at https://doi.org/10.6084/m9.figshare.26063938.v1 (ref. 55). All published raw sequencing data and genome assemblies of real cases used for HapHiC validation are listed in Supplementary Data 4, including potato C88 (Hi-C reads on NGDC: CRR381057; assembly at http://solomics.agis.org.cn/potato/ftp/tetraploid/C88.v1.fa.gz), S.
54)。这项工作中组装的马铃薯C88基因组可通过Figshare获得https://doi.org/10.6084/m9.figshare.26063938.v1(参考文献55)。补充数据4列出了所有已发布的用于单倍体验证的真实病例的原始测序数据和基因组装配,包括马铃薯C88(NGDC上的Hi-C读数:CRR381057;装配在http://solomics.agis.org.cn/potato/ftp/tetraploid/C88.v1.fa.gz),S。
spontaneum Np-X (Hi-C reads on the National Center for Biotechnology Information (NCBI): SRR22405988, SRR22405989; assembly on NCBI: GCA_022457205.1), S. spontaneum AP85-411 (Hi-C reads on NGDC: CRR055769, CRR055770, CRR055771, CRR055772; assembly at https://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.HiC_chr_asm.fasta), M.
自发性Np-X(Hi-C阅读国家生物技术信息中心(NCBI):SRR22405988,SRR22405989;在NCBI上组装:GCA\U 022457205.1),自发性链球菌AP85-411(在NGDC上的Hi-C读数:CRR055769,CRR055770,CRR055771,CRR055772;在https://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.HiC_chr_asm.fasta),M。
sativa XinJiangDaYe (Hi-C reads on NCBI: SRR9026577, SRR9026578; assembly at https://doi.org/10.6084/m9.figshare.12327602.v3 (ref. 56)), M. sativa Zhongmu-4 (Hi-C reads on NGDC: CRR330262, CRR330263, CRR330264, CRR330265, CRR330266; assembly at https://figshare.com/s/fb4ba8e0b871007a9e6c (ref. 57)), C.
苜蓿新疆大叶(Hi-C读入NCBI:SRR9026577,SRR9026578;组装于https://doi.org/10.6084/m9.figshare.12327602.v3(参考文献56)),M.sativa Zhongmu-4(Hi-C在NGDC上读取:CRR330262,CRR330263,CRR330264,CRR330265,CRR330266;组装于https://figshare.com/s/fb4ba8e0b871007a9e6c(参考文献57)),C。
sinensis Tieguanyin (Hi-C reads on NCBI: SRR12744827; assembly on NGDC: GWHASIX00000000), human HG002 (Hi-C reads: HG002.HiC_1* at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_panel/HG002/hpp_HG002_NA24385_son_v1/hic/, HG002.HiC_* at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_pa.
中华铁观音(Hi-C在NCBI上读取:SRR12744827;在NGDC上组装:GWHASIX00000000),人HG002(Hi-C读取:HG002。HiC\u 1*athttps://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_panel/HG002/hpp_HG002_NA24385_son_v1/hic/,HG002。HiC\u*athttps://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_pa.
Code availability
代码可用性
HapHiC and all custom scripts for dataset simulation are available via GitHub at https://github.com/zengxiaofei/HapHiC. The source code of modified ALLHiC are available via GitHub at https://github.com/zengxiaofei/allhic.
HapHiC和所有用于数据集模拟的自定义脚本都可以通过GitHub获得https://github.com/zengxiaofei/HapHiC.修改后的ALLHiC的源代码可通过GitHub获得https://github.com/zengxiaofei/allhic.
ReferencesNurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).Article
参考文献Nurk,S。等人。人类基因组的完整序列。科学376,44-53(2022)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).Article
Tang,H。等人。ALLMAPS:基于多个地图的稳健支架排序。基因组生物学。16,3(2015)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).Article
。美国国家生物技术公司。311119-1125(2013)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).Article
Putnam,N.H.等人。使用体外方法进行远程连锁的染色体规模鸟枪组装。Genome Res.26342–350(2016)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).Article
Dudchenko,O。等人。使用Hi-C从头组装埃及伊蚊基因组产生染色体长度支架。科学356,92-95(2017)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Ghurye, J. et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput. Biol. 15, e1007273 (2019).Article
Ghurye,J.等人。将Hi-C链接与组装图整合以进行染色体规模的组装。PLoS计算机。生物学杂志15,e1007273(2019)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019).Article
Zhang,X.,Zhang,S.,Zhao,Q.,Ming,R。&Tang,H。基于Hi-C数据的等位基因感知的染色体规模同源多倍体基因组的组装。《自然植物》5833-845(2019)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39, btac808 (2022).Article
Zhou,C.,McCarthy,S.A。和Durbin,R.YaHS:另一种Hi-C脚手架工具。生物信息学39,btac808(2022)。文章
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).Article
Lieberman-Aiden,E。等人。远程相互作用的综合图谱揭示了人类基因组的折叠原理。科学326289-293(2009)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Yuan, Y., Scheben, A., Edwards, D. & Chan, T.-F. Toward haplotype studies in polyploid plants to assist breeding. Mol. Plant 14, 1969–1972 (2021).Article
Yuan,Y.,Scheben,A.,Edwards,D。&Chan,T.-F。致力于多倍体植物的单倍型研究,以帮助育种。摩尔工厂141969-1972(2021)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).Article
Koren,S.等人。使用三重分箱从头组装单倍型解析的基因组。美国国家生物技术公司。361174-1182(2018)。文章
CAS
中科院
Google Scholar
谷歌学者
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335 (2022).Article
Cheng,H。等人。单倍型解析了没有亲本数据的二倍体基因组的组装。美国国家生物技术公司。401332-1335(2022)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Meyer, R. S., DuVal, A. E. & Jensen, H. R. Patterns and processes in crop domestication: an historical review and quantitative analysis of 203 global food crops. New Phytol. 196, 29–48 (2012).Article
Meyer,R.S.,DuVal,A.E.&Jensen,H.R。作物驯化的模式和过程:203种全球粮食作物的历史回顾和定量分析。新植物醇。196,29-48(2012)。文章
PubMed
PubMed
Google Scholar
谷歌学者
Huang, X., Huang, S., Han, B. & Li, J. The integrated genomics of crop domestication and breeding. Cell 185, 2828–2839 (2022).Article
Huang,X.,Huang,S.,Han,B。&Li,J。作物驯化和育种的综合基因组学。细胞1852828-2839(2022)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Cheng, H., Asri, M., Lucas, J., Koren, S. & Li, H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat. Methods 21, 967–970 (2024).Article
Cheng,H.,Asri,M.,Lucas,J.,Koren,S。&Li,H。可扩展的端粒到端粒组装,用于二倍体和多倍体基因组的双图。自然方法21967-970(2024)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).Article
Zhang,J。等人。等位基因定义了自多倍体甘蔗Saccharum spontaneum L.Nat。Genet的基因组。501565-1573(2018)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Chen, H. et al. Allele-aware chromosome-level genome assembly and efficient transgene-free genome editing for the autotetraploid cultivated alfalfa. Nat. Commun. 11, 2494 (2020).Article
Chen,H.等人。同源四倍体栽培苜蓿的等位基因感知染色体水平基因组组装和有效的无转基因基因组编辑。国家公社。112494(2020)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Wang, P. et al. Genetic basis of high aroma and stress tolerance in the oolong tea cultivar genome. Horticulture Res. 8, 107 (2021).Article
Wang,P。等人。乌龙茶品种基因组中高香气和抗逆性的遗传基础。园艺第8107号决议(2021年)。文章
CAS
中科院
Google Scholar
谷歌学者
Zhang, X. et al. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 53, 1250–1259 (2021).Article
Zhang,X。等人。单倍型解析的基因组组装为茶树的进化史提供了见解。纳特·吉内特。531250-1259(2021)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Zhang, Q. et al. Genomic insights into the recent chromosome reduction of autopolyploid sugarcane Saccharum spontaneum. Nat. Genet. 54, 885–896 (2022).Article
Zhang,Q.等人。对自多倍体甘蔗Saccharum spontaneum最近染色体减少的基因组学见解。纳特·吉内特。54885-896(2022)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Dongen, S. V. Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 30, 121–141 (2008).Article
Dongen,S.V。通过离散解耦过程进行图聚类。暹罗J.矩阵肛门。应用。30121-141(2008)。文章
Google Scholar
谷歌学者
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Pedregosa,F.等人,《Scikit learn:Python中的机器学习》。J、 马赫。学习。第122825-2830号决议(2011年)。
Google Scholar
谷歌学者
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).Article
Nurk,S。等人。HiCanu:从高保真长读物中准确组装节段重复,卫星和等位基因变体。基因组研究301291-1305(2020)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Kawahara, Y. et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 4 (2013).Article
Kawahara,Y.等人。使用下一代序列和光学图谱数据改进水稻日本晴参考基因组。赖斯6,4(2013)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Berardini, T. Z. et al. The Arabidopsis information resource: making and mining the ‘gold standard’ annotated reference plant genome. Genesis 53, 474–485 (2015).Article
Berardini,T.Z.等人,《拟南芥信息资源:制作和挖掘“金标准”注释的参考植物基因组》。《创世纪》53474-485(2015)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Lawrence, I. K. L. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989).Article
Lawrence,I.K.L。一种用于评估再现性的一致性相关系数。生物特征45255-268(1989)。文章
Google Scholar
谷歌学者
Blanchette, M., Kunisawa, T. & Sankoff, D. Parametric genome rearrangement. Gene 172, GC11–GC17 (1996).Article
Blanchette,M.,Kunisawa,T。&Sankoff,D。参数基因组重排。基因172,GC11–GC17(1996)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).Article
Juicebox为Hi-C接触图提供了一个无限缩放的可视化系统。细胞系统。3,99-101(2016)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Long, R. et al. Genome assembly of alfalfa cultivar Zhongmu-4 and identification of SNPs associated with agronomic traits. Genom. Proteom. Bioinform. 20, 14–28 (2022).Article
Long,R.等人。苜蓿品种中牧4号的基因组组装和与农艺性状相关的SNP的鉴定。基因组。蛋白质组学。生物信息。20,14-28(2022)。文章
CAS
中科院
Google Scholar
谷歌学者
Bao, Z. et al. Genome architecture and tetrasomic inheritance of autotetraploid potato. Mol. Plant 15, 1211–1226 (2022).Article
Bao,Z.等。同源四倍体马铃薯的基因组结构和四倍体遗传。摩尔植物151211-1226(2022)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Sun, H. et al. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar. Nat. Genet. 54, 342–348 (2022).Article
Sun,H.等人。染色体规模和单倍型解析了四倍体马铃薯品种的基因组组装。纳特·吉内特。54342-348(2022)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Heaton, E. A. et al. in Advances in Botanical Research Vol. 56. (eds Kader, J.-C. & Delseny, M.) 75–137 (Academic Press, 2010).Chramiec-Głąbik, A., Grabowska-Joachimiak, A., Sliwinska, E., Legutko, J. & Kula, A. Cytogenetic analysis of Miscanthus × giganteus and its parent forms. Caryologia 65, 234–242 (2012).Article .
Heaton,E.A.等人,《植物研究进展》第56卷。(eds Kader,J.-C.&Delseny,M。)75-137(学术出版社,2010)。Chramiec-Głąbik,A.,Grabowska-Joachimiak,A.,Sliwinska,E.,Legutko,J.&Kula,A.芒及其亲本形式的细胞遗传学分析。Caryologia 65234-242(2012)。文章。
Google Scholar
谷歌学者
Mitros, T. et al. Genome biology of the paleotetraploid perennial biomass crop Miscanthus. Nat. Commun. 11, 5442 (2020).Article
Mitros,T。等人。古四倍体多年生生物量作物芒的基因组生物学。国家公社。。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
De Vega, J., Donnison, I., Dyer, S. & Farrar, K. Draft genome assembly of the biofuel grass crop Miscanthus sacchariflorus. F1000Res. 10, 29 (2021).Article
De Vega,J.,Donnison,I.,Dyer,S。&Farrar,K。生物燃料草作物芒草的基因组装配草案。F1000Res。10,29(2021)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Miao, J. et al. Chromosome-scale assembly and analysis of biomass crop Miscanthus lutarioriparius genome. Nat. Commun. 12, 2458 (2021).Article
Miao,J.等人。生物质作物芒基因组的染色体规模组装和分析。国家公社。122458(2021)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Zhang, G. et al. The reference genome of Miscanthus floridulus illuminates the evolution of Saccharinae. Nat. Plants 7, 608–618 (2021).Article
Zhang,G。等人。芒属植物的参考基因组阐明了糖精亚科的进化。《自然植物》7608-618(2021)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Dong, H. et al. Winter hardiness of Miscanthus (II): genetic mapping for overwintering ability and adaptation traits in three interconnected Miscanthus populations. Glob. Change Biol. Bioenergy 11, 706–726 (2019).Article
Dong,H.等人。芒的抗寒性(II):三个相互关联的芒属种群越冬能力和适应性状的遗传作图。全球。改变生物。生物能源11706-726(2019)。文章
Google Scholar
谷歌学者
Brohée, S. & van Helden, J. Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinformatics 7, 488 (2006).Article
Brohée,S。&van Helden,J。蛋白质-蛋白质相互作用网络聚类算法的评估。BMC生物信息学7488(2006)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Li, L., Stoeckert, C. J. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).Article
Li,L.,Stoeckert,C。J。和Roos,D。S。OrthoMCL:鉴定真核基因组的直系同源群。基因组研究132178-2189(2003)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Tang, H. et al. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 18, 1944–1954 (2008).Article
Tang,H.等人。通过多重比对被子植物基因图谱揭示古代六倍体。Genome Res.181944–1954(2008)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Wang, S. et al. EndHiC: assemble large contigs into chromosome-level scaffolds using the Hi-C links from contig ends. BMC Bioinformatics 23, 528 (2022).Article
Wang,S。等人。EndHiC:使用来自重叠群末端的Hi-C连接将大重叠群组装成染色体水平的支架。BMC生物信息学23528(2022)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Guan, D. et al. Efficient iterative Hi-C scaffolder based on N-best neighbors. BMC Bioinformatics 22, 569 (2021).Article
Guan,D。等人。基于N-最佳邻居的高效迭代Hi-C支架。BMC生物信息学22569(2021)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).Article
Chen,S.,Zhou,Y.,Chen,Y。&Gu,J。fastp:一种超快速的一体化FASTQ预处理器。生物信息学34,i884–i890(2018)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).Article
。细胞系统。3,95-98(2016)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).Article
Li,H。用BWA-MEM比对序列读数,克隆序列和组装重叠群。预印于https://arxiv.org/abs/1303.3997(2013年)。Faust,G.G。和Hall,I.M.SAMBLASTER:快速重复标记和结构变体读取提取。生物信息学302503-2505(2014)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).Article
Li,H。等人。序列比对/图谱格式和SAMtools。生物信息学252078-2079(2009)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Tange, O. GNU parallel 2018. Zenodo https://doi.org/10.5281/zenodo.1146014 (2018).Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).Article
Tange,O.GNU parallel 2018。泽诺多https://doi.org/10.5281/zenodo.1146014(2018年)。Wu,T。D。和Watanabe,C。K。GMAP:mRNA和EST序列的基因组作图和比对程序。生物信息学211859-1875(2005)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).Article
Li,H。Minimap2:核苷酸序列的成对比对。生物信息学343094-3100(2018)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).Article
Goel,M.,Sun,H.,Jiao,W.-B.&Schneeberger,K.SyRI:从全基因组组装中发现基因组重排和局部序列差异。基因组生物学。20277(2019)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Goel, M. & Schneeberger, K. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics 38, 2922–2926 (2022).Article
Goel,M。&Schneeberger,K。plotsr:可视化多个基因组之间的结构相似性和重排。生物信息学382922-2926(2022)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Zeng, X. Genome assembly of autotetraploid Medicago sativa XinJiangDaYe. figshare https://doi.org/10.6084/m9.figshare.26037289.v1 (2024).Zeng, X. Genome assembly of autotetraploid potato (Solanum tuberosum) C88. figshare https://doi.org/10.6084/m9.figshare.26063938.v1 (2024).Zeng, Y.
Zeng,X.同源四倍体紫花苜蓿新疆大冶的基因组组装。figshare公司https://doi.org/10.6084/m9.figshare.26037289.v1(2024年)。Zeng,X.同源四倍体马铃薯(Solanum tuberosum)C88的基因组组装。figshare公司https://doi.org/10.6084/m9.figshare.26063938.v1(2024年)。曾,Y。
genome fasta sequence and annotation files. figshare https://doi.org/10.6084/m9.figshare.12327602.v3 (2020).Long, R. ZM-4 alfalfa genome. figshare https://figshare.com/s/fb4ba8e0b871007a9e6c (2020).Li, Y. Corylus mandshurica genome. figshare https://doi.org/10.6084/m9.figshare.12523124.v1 (2020).Miao, J.
基因组fasta序列和注释文件。figshare公司https://doi.org/10.6084/m9.figshare.12327602.v3(2020年)。长,R.ZM-4苜蓿基因组。figshare公司https://figshare.com/s/fb4ba8e0b871007a9e6c(2020年)。Li,Y.榛子(Corylus mandshurica)基因组。figshare公司https://doi.org/10.6084/m9.figshare.12523124.v1(2020年)。苗,J。
Mlu_HiC.gff3. figshare https://doi.org/10.6084/m9.figshare.13013795.v1 (2020).Miao, J. Mlu_HiC_cds.fasta.gz. figshare https://doi.org/10.6084/m9.figshare.12992984.v2 (2020).Download referencesAcknowledgementsThis work was supported by the National Natural Science Foundation of China (32100459) and the fellowship of China Postdoctoral Science Foundation (2020M672695) to X.
。figshare公司https://doi.org/10.6084/m9.figshare.13013795.v1(2020年)。Miao,J。Mlu\u HiC\u cds.fasta.gz。figshare公司https://doi.org/10.6084/m9.figshare.12992984.v2(2020年)。下载参考文献致谢这项工作得到了国家自然科学基金(32100459)和中国博士后科学基金(2020M672695)对X的资助。
Zeng. Additional funding was provided by the Shenzhen Municipal Science and Technology Innovation Commission Foundation (JCYJ20220530114415036, JCYJ20210324104800001) and the National Natural Science Foundation of China (32070625) to G.C. and by the National Natural Science Foundation of China (32222019) to X.
曾。深圳市科学技术创新委员会基金(JCYJ20253011415036,JCYJ20210324104800001)和国家自然科学基金(32070625)向G.C.和国家自然科学基金(3222019)向X提供了额外的资金。
Zhang. The Center for Computational Science and Engineering at Southern University of Science and Technology also provided support for this work. We thank D. Zhang and D. Ru from Lanzhou University, Z. Bao from the Max Planck Institute for Biology Tübingen and S. Zhu from Huazhong Agricultural University for their valuable suggestions and contributions to HapHiC.Author informationAuthors and AffiliationsDepartment of Human Cell Biology and Genetics, Joint Laboratory of Guangdong-Hong Kong Universities.
张。南方科技大学计算科学与工程中心也为这项工作提供了支持。我们感谢兰州大学的D.Zhang和D.Ru,蒂宾根马克斯·普朗克生物研究所的Z.Bao和华中农业大学的S.Zhu为HapHiC提供了宝贵的建议和贡献。作者信息作者和附属机构粤港大学联合实验室人类细胞生物学与遗传学系。
PubMed Google ScholarZili YiView author publicationsYou can also search for this author in
PubMed Google ScholarZili YiView作者出版物您也可以在
PubMed Google ScholarXingtan ZhangView author publicationsYou can also search for this author in
PubMed Google ScholarXingtan ZhangView作者出版物您也可以在
PubMed Google ScholarYuhui DuView author publicationsYou can also search for this author in
PubMed Google ScholarYuhui DuView作者出版物您也可以在
PubMed Google ScholarYu LiView author publicationsYou can also search for this author in
PubMed Google ScholarYu LiView作者出版物您也可以在
PubMed Google ScholarZhiqing ZhouView author publicationsYou can also search for this author in
PubMed谷歌学者周志清查看作者出版物您也可以在
PubMed Google ScholarSijie ChenView author publicationsYou can also search for this author in
PubMed Google ScholarSijie ChenView作者出版物您也可以在
PubMed Google ScholarHuijie ZhaoView author publicationsYou can also search for this author in
PubMed Google ScholarHuijie ZhaoView作者出版物您也可以在
PubMed Google ScholarSai YangView author publicationsYou can also search for this author in
PubMed Google ScholarSai YangView作者出版物您也可以在
PubMed Google ScholarYibin WangView author publicationsYou can also search for this author in
PubMed Google ScholarYibin WangView作者出版物您也可以在
PubMed Google ScholarGuoan ChenView author publicationsYou can also search for this author in
PubMed Google ScholarGuoan ChenView作者出版物您也可以在
PubMed Google ScholarContributionsX. Zeng designed the algorithms, implemented HapHiC, analysed the genome of M. × giganteus and wrote the manuscript. Z.Y. and S.Y. managed and provided the plant materials of M. × giganteus. X. Zeng, Y.D., Y.L., Z.Z., S.C. and H.Z. benchmarked HapHiC and other scaffolding tools.
PubMed谷歌学术贡献x。曾设计了算法,实现了HapHiC,分析了M.×giganteus的基因组并撰写了手稿。Z、 Y.和S.Y.管理并提供了M.×giganteus的植物材料。十、 Zeng,Y.D.,Y.L.,Z.Z.,S.C.和H.Z.以HapHiC和其他脚手架工具为基准。
G.C., X. Zhang and Y.W. provided suggestions for the algorithms. G.C. and X. Zhang revised the manuscript.Corresponding authorsCorrespondence to.
G、 C.,X.Zhang和Y.W.为算法提供了建议。G、 C.和X.Zhang修改了手稿。通讯作者通讯。
Xiaofei Zeng or Guoan Chen.Ethics declarations
曾晓飞或陈国安。道德宣言
Competing interests
相互竞争的利益
The authors declare no competing interests.
作者声明没有利益冲突。
Peer review
同行评审
Peer review information
同行评审信息
Nature Plants thanks Haoyu Cheng and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Nature Plants感谢Haoyu Cheng和另一位匿名审稿人为这项工作的同行评审做出的贡献。
Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Extended dataExtended Data Fig. 1 Workflow for simulating datasets with various adverse factors.Rounded rectangles represent chromosome-level genomes in FASTA format, while rectangles denote simulated assemblies and/or Hi-C data.
Additional informationPublisher的注释Springer Nature在已发布的地图和机构隶属关系中的管辖权主张方面保持中立。。圆形矩形表示FASTA格式的染色体水平基因组,而矩形表示模拟组件和/或Hi-C数据。
The Hi-C reads for the assemblies in red rectangles originate from real sequencing data, whereas the Hi-C reads for the assemblies in blue rectangles were simulated using sim3C. Arrows depict the simulation processes.Extended Data Fig. 2 Assembly correction method in HapHiC.The brown line chart represents the Hi-C spanning coverage along contig CM039579.1_ctg399_+.
红色矩形中组件的Hi-C读数来自实际测序数据,而蓝色矩形中组件的Hi-C读数是使用sim3C模拟的。箭头描绘了模拟过程。扩展数据图2 HapHiC中的装配校正方法。。
The dashed blue line illustrates the coverage threshold, which is set at one-fifth of the median coverage by default. Regions of high and low coverage are denoted by red and blue rectangles, respectively. The green triangle indicates the breakpoint as determined by HapHiC.Extended Data Fig. 3 Rank-sum algorithm for identifying chimeric and collapsed contigs in HapHiC.a, A network graph illustrates contigs connected via Hi-C links.
蓝色虚线表示覆盖率阈值,默认情况下该阈值设置为中值覆盖率的五分之一。高覆盖率和低覆盖率的区域分别用红色和蓝色矩形表示。绿色三角形表示由HapHiC确定的断点。。
Red and blue circles (vertices) represent contigs from different haplotypes. Bicolor and purple circles symbolize chimeric and collapsed contigs, respectively. Edges connecting these circles indicate Hi-C links between contigs, with the number of Hi-C links displayed adjacent to the edges. b, The ranking of each contig relative to others based on the number of Hi-C links.
红色和蓝色圆圈(顶点)代表来自不同单倍型的重叠群。双色和紫色圆圈分别代表嵌合和塌陷的重叠群。连接这些圆圈的边缘表示重叠群之间的Hi-C链接,在边缘附近显示Hi-C链接的数量。b、 根据Hi-C链接的数量,每个重叠群相对于其他重叠群的排名。
Gray numbers indicate the absence of direct connections between contigs. c, The calculation of rank-sum values. Different colors are used to trace the origin of the ranks.Extended Data Fig. 4 Comparison of inter-allele Hi-C links betwe.
灰色数字表示重叠群之间没有直接联系。c、 。不同的颜色被用来追踪等级的起源。扩展数据图4等位基因间Hi-C链接的比较。
Nat. Plants (2024). https://doi.org/10.1038/s41477-024-01755-3Download citationReceived: 13 December 2023Accepted: 01 July 2024Published: 05 August 2024DOI: https://doi.org/10.1038/s41477-024-01755-3Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard.
《自然植物》(2024)。https://doi.org/10.1038/s41477-024-01755-3Download引文接收日期:2023年12月13日接收日期:2024年7月1日发布日期:2024年8月5日OI:https://doi.org/10.1038/s41477-024-01755-3Share本文与您共享以下链接的任何人都可以阅读此内容:获取可共享链接对不起,本文目前没有可共享的链接。复制到剪贴板。
Provided by the Springer Nature SharedIt content-sharing initiative
由Springer Nature SharedIt内容共享计划提供
Subjects
主题
BiofuelsBioinformaticsComparative genomicsSoftware
生物信息比较基因组软件