商务合作
动脉网APP
可切换为仅中文
AbstractMetagenomic sequencing has provided great advantages in the characterisation of microbiomes, but currently available analysis tools lack the ability to combine subspecies-level taxonomic resolution and accurate abundance estimation with functional profiling of assembled genomes. To define the microbiome and its associations with human health, improved tools are needed to enable comprehensive understanding of the microbial composition and elucidation of the phylogenetic and functional relationships between the microbes.
摘要宏基因组测序在微生物组的表征方面提供了巨大的优势,但目前可用的分析工具缺乏将亚种级分类学分辨率和准确的丰度估计与组装基因组的功能分析相结合的能力。为了定义微生物组及其与人类健康的关系,需要改进的工具来全面了解微生物组成并阐明微生物之间的系统发育和功能关系。
Here, we present MAGinator, a freely available tool, tailored for profiling of shotgun metagenomics datasets. MAGinator provides de novo identification of subspecies-level microbes and accurate abundance estimates of metagenome-assembled genomes (MAGs). MAGinator utilises the information from both gene- and contig-based methods yielding insight into both taxonomic profiles and the origin of genes and genetic content, used for inference of functional content of each sample by host organism.
在这里,我们介绍MAGinator,一种免费提供的工具,专门用于分析鸟枪宏基因组学数据集。MAGinator提供了亚种级微生物的从头鉴定和宏基因组组装基因组(MAG)的准确丰度估计。MAGinator利用来自基于基因和重叠群的方法的信息,深入了解分类学概况以及基因和遗传内容的起源,用于推断宿主生物每个样品的功能含量。
Additionally, MAGinator facilitates the reconstruction of phylogenetic relationships between the MAGs, providing a framework to identify clade-level differences..
此外,MAGinator有助于重建MAG之间的系统发育关系,为识别进化枝水平差异提供了一个框架。。
IntroductionDNA sequencing has revolutionised our ability to gain insight into microbial compositions without relying on the ability to cultivate organisms. To explore these compositions, various methods have been developed that either rely on databases of marker genes of known organisms or attempt to reconstruct the chromosomes directly from the short reads by first assembling them into longer contigs and then binning these based on co-occurrences or DNA composition.
引言DNA测序彻底改变了我们在不依赖培养生物的能力的情况下深入了解微生物组成的能力。为了探索这些组成,已经开发了各种方法,这些方法要么依赖于已知生物的标记基因数据库,要么试图通过首先将它们组装成更长的重叠群,然后根据共现或DNA组成将它们分类,直接从短读段重建染色体。
Mapping reads against marker gene databases with tools such as MetaPhlAn1, MetaPhyler2 and mOTUs3 is a fast and effective way of recovering the microbial composition both because the library depth required can be quite shallow and because the computational requirements are smaller. However, such methodologies have limitations originating from the reliance on predefined databases, limited ability to estimate abundances at higher taxonomic resolution4,5, and the lack of information on the functional repertoire of the identified taxa.
使用MetaPhlAn1,MetaPhyler2和mOTUs3等工具对标记基因数据库进行映射读取是一种快速有效的恢复微生物组成的方法,因为所需的文库深度可能很浅,并且计算要求较小。然而,这种方法的局限性源于对预定义数据库的依赖,在更高的分类学分辨率下估计丰度的能力有限4,5,以及缺乏关于已鉴定分类群功能库的信息。
Conversely, de novo binning strategies require high sequencing depth but can recover high-quality metagenome-assembled genomes (MAGs) from which the functional gene content can be directly linked to a specific organism. Ideally, this can recover genomes at the subspecies level that can be used in downstream analysis to generate more specific hypotheses about associations with outcomes.
相反,从头分箱策略需要高测序深度,但可以恢复高质量的宏基因组组装基因组(MAG),从中功能基因含量可以直接与特定生物体相关联。理想情况下,这可以在亚种水平上恢复基因组,可用于下游分析,以产生有关与结果关联的更具体假设。
One example of this is to be able to identify organisms, which have the capacity of degrading Human Milk Oligosaccharides (HMOs), which are an important energy source for breastfed infants. Especially Bifidobacteria have this functionality, where certain strains or subspecies have specific preferences for certain HMO types6,7,8,9.
其中一个例子是能够识别具有降解母乳低聚糖(HMO)能力的生物体,HMO是母乳喂养婴儿的重要能量来源。特别是双歧杆菌具有这种功能,其中某些菌株或亚种对某些HMO类型具有特定的偏好6,7,8,9。
Previously, it has been established that the presence of Bifidobact.
以前,已经确定双歧杆菌的存在。
Input
输入
The input to the MAGinator workflow comprises a set of samples with (1) shotgun metagenomic sequenced reads, (2) their sample-wise assembled contigs, and (3) sample-wise MAGs (groups of contigs from the same genome), clustered across samples, as defined by a metagenomic binning tool (see below).Reads should be provided in a comma-separated file giving the location of the fastq files and formatted as: SampleName,PathToForwardReads,PathToReverseReads.
。读取应以逗号分隔的文件提供,给出fastq文件的位置,并格式化为:SampleName、PathToForwardReads、PathToReverseReads。
The contigs should be nucleotide sequences in FASTA format. The MAGs should be given as a tab-separated file including the MAG identifier and contig identifier. The sample-wise MAGs should be grouped into MAG clusters representing a taxonomic entity found across the samples, which will usually be species but can also be at the subspecies level, depending on the characteristics of the input data.
重叠群应该是FASTA格式的核苷酸序列。MAG应作为制表符分隔的文件提供,包括MAG标识符和重叠群标识符。样本MAG应分组为MAG簇,代表样本中发现的分类实体,通常是物种,但也可以是亚种水平,具体取决于输入数据的特征。
MAGinator is flexible regarding which tool is being used for creating the MAGs, however we recommend using VAMB18. If other binners are used, MAG clustering across samples would have to be implemented before running VAMB. As MAGinator relies on the input MAGs a larger sample size is recommended. The specific number of samples relies both on the sequencing depth and the diversity of the community being analysed.
MAGinator在创建MAGs时使用哪种工具是灵活的,但是我们建议使用VAMB18。如果使用其他binner,则必须在运行VAMB之前实现跨样本的MAG聚类。由于MAGinator依赖于输入MAG,因此建议使用较大的样本量。样本的具体数量取决于测序深度和所分析社区的多样性。
We advise the user to look at the number of MAG clusters created and assess them according to the environment being analysed.DependenciesThe dependencies to run MAGinator are mamba21 and Snakemake22—all other dependencies are installed automatically by Snakemake through MAGinator. Additionally, MAGinator needs the GTDB-tk database downloaded for taxonomic annotation of MAGs and as a reference for the phylogenetic SNV-level analysis of the signature genes.Output generatedMAGinator generates multiple outputs and intermediate fil.
我们建议用户查看创建的MAG集群的数量,并根据所分析的环境对其进行评估。依赖项运行MAGinator的依赖项是mamba21和Snakemake22所有其他依赖项都是由Snakemake通过MAGinator自动安装的。此外,MAGinator需要下载GTDB tk数据库以进行MAG的分类学注释,并作为特征基因系统发育SNV水平分析的参考。Output generatedMAGinator生成多个输出和中间文件。
COPSAC dataset - data characteristics and preparation
COPSAC数据集-数据特征和准备
The COPSAC2010 cohort consists of 700 unselected children recruited during pregnancy week 24 and followed closely throughout childhood with extensive sample collection, exposure assessments and longitudinal clinical phenotyping38,39,40. From the cohort, we used 662 deeply sequenced metagenomics samples taken at 1 year of age.
COPSAC2010队列由700名未经选择的儿童组成,这些儿童在怀孕第24周招募,并在整个儿童时期进行了广泛的样本收集,暴露评估和纵向临床表型38,39,40。从该队列中,我们使用了1岁时采集的662个深度测序的宏基因组学样本。
The details of the study and sequencing protocol have previously been published40. The samples consist of 150-bp paired-end reads per with mean ± SD: 48 ± 15.5 million reads.The data was analysed using the same approach as for the strain-madness data set, with the exception of filtering away reads shorter than 50 bp in the preprocessing step.
研究和测序方案的详细信息先前已发布40。样本由每个150 bp的配对末端读数组成,平均SD:4850万个读数。使用与应变疯狂数据集相同的方法分析数据,除了在预处理步骤中过滤掉短于50 bp的读数。
This workflow yielded 880 MAG clusters for the samples.MAGinator was run using the reads, contigs and MAGs from VAMB as input. Thus creating a set of signature genes for each MAG cluster which has been found de novo for this particular dataset.CHILD dataset - data characteristics and preparationThe Canadian Healthy Infant Longitudinal Development (CHILD) study comprises a large longitudinal birth cohort with stool collection in infancy for microbiome analysis41.
该工作流程为样本产生了880个MAG集群。。因此,为每个MAG簇创建了一组签名基因,这些基因是针对该特定数据集从头发现的。儿童数据集-数据特征和准备加拿大健康婴儿纵向发育(CHILD)研究包括一个大型纵向出生队列,在婴儿期收集粪便用于微生物组分析41。
Stool samples used in this analysis were sequenced to an average depth of 4.85 million reads (SD: 1.79 million), and samples which included >1 million reads after preprocessing were kept for the current analysis7.We analysed a subset of the CHILD cohort, consisting of 2846 metagenomic sequenced faecal samples from infants.
本分析中使用的粪便样本的平均深度为485万读数(标准差:179万),预处理后包含>100万读数的样本保留用于当前分析7。我们分析了儿童队列的一个子集,由2846个宏基因组测序的婴儿粪便样本组成。
To overcome the shallow sequencing, the signature genes of the COPSAC2010 cohort were used to profile the samples instead of running MAGinator. To ensure that the process of the read mappings was identical to COPSAC, the read mapping was carried out using the full gene catalogue. Next, the read counts for the signature genes were ext.
为了克服浅测序,使用COPSAC2010队列的特征基因来分析样品,而不是运行MAGinator。为了确保读取映射的过程与COPSAC相同,使用完整的基因目录进行读取映射。接下来,签名基因的读取计数是ext。
Data availability
数据可用性
All relevant data supporting the key findings of this study are available within the article and its Supplementary Information files. Supplementary dataset 1 contain the Supplementary Figs. and tables. The CAMI II strain-madness benchmarking dataset is available at https://frl.publisso.de/data/frl:6425521/strain/short_read/.
本文及其补充信息文件中提供了支持本研究主要发现的所有相关数据。补充数据集1包含补充图和表。CAMI II菌株疯狂基准数据集可在https://frl.publisso.de/data/frl:6425521/strain/short_read/.
The gold standard and benchmark profiles are found at https://github.com/CAMI-challenge/second_challenge_evaluation/tree/master/profiling. The dataset from Franzosa et al. used for benchmarking is available as supplementary from their paper and the raw data is available at ENA accession SAMN08049618.
金标准和基准概况见https://github.com/CAMI-challenge/second_challenge_evaluation/tree/master/profiling.Franzosa等人用于基准测试的数据集可作为其论文的补充,原始数据可在ENA登录号SAMN08049618上获得。
The raw COPSAC fastq files are available at NCBI under BioProject PRJNA715601. The honey-bee data is publicly available and found in the sequence read archive (SRA) with the accession SRP150166. The Tara Oceans data set is publicly available and found at ENA with Study accession PRJEB1787. The CHILD shotgun metagenomics sequencing data is available at NCBI BioProject PRJNA838575 . Source data are provided in this paper.
。蜜蜂数据可公开获得,并可在序列读取档案(SRA)中找到,登录号为SRP150166。塔拉海洋数据集可在ENA公开获得,研究登录号为PRJEB1787。儿童霰弹枪宏基因组测序数据可在NCBI生物项目PRJNA838575获得。本文提供了源数据。
Availability and implementation: MAGinator is available as a Python module at https://github.com/Russel88/MAGinator..
可用性和实现:MAGinator在https://github.com/Russel88/MAGinator..
Code availability
代码可用性
MAGinator is available at GitHub (https://github.com/Russel88/MAGinator)48.
MAGinator可在GitHub获得(https://github.com/Russel88/MAGinator)48页。
ReferencesBlanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol. 11, 1633–1644 (2023).Liu, B., Gibbons, T., Ghodsi, M. & Pop, M. MetaPhyler: Taxonomic profiling for metagenomic sequences. in 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 95–100 (IEEE, Hong Kong, China, 2010).
参考文献Blanco-Míguez,A。等人使用MetaPhlAn 4扩展和改进了未表征物种的宏基因组分类学分析。纳特生物技术。111633-1644(2023)。Liu,B.,Gibbons,T.,Ghodsi,M。&Pop,M。MetaPhyler:宏基因组序列的分类学分析。2010年IEEE生物信息学和生物医学国际会议(BIBM)95-100(IEEE,中国香港,2010)。
.Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 1014 (2019).Article .
Milanese,A。等人。mOTUs2的微生物丰度,活性和种群基因组分析。国家公社。10104(2019)。文章。
ADS
广告
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Liu, Y. et al. CSMD: a computational subtraction-based microbiome discovery pipeline for species-level characterization of clinical metagenomic samples. Bioinformatics 36, 1577–1583 (2019).Meyer, F. et al. Critical Assessment of Metagenome Interpretation: the second round of challenges.
Liu,Y。et al。CSMD:基于计算减法的微生物组发现管道,用于临床宏基因组样本的物种水平表征。生物信息学361577-1583(2019)。Meyer,F.等人,《宏基因组解释的批判性评估:第二轮挑战》。
Nat. Methods 19, 429–440 (2022).Article .
自然方法19429-440(2022)。文章。
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Underwood, M. A., German, J. B., Lebrilla, C. B. & Mills, D. A. Bifidobacterium longum subspecies infantis: champion colonizer of the infant gut. Pediatr. Res 77, 229–235 (2015).Article
Underwood,M.A.,German,J.B.,Lebrilla,C.B。&Mills,D.A。长双歧杆菌亚种婴儿:婴儿肠道的冠军殖民者。儿科。第77229-235号决议(2015年)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Dai, D. L. Y. et al. Breastfeeding enrichment of B. longum subsp. infantis mitigates the effect of antibiotics on the microbiota and childhood asthma risk. Med. 4, 92–112.e5 (2023).Article
Dai,D.L.Y.等人。母乳喂养富集长双歧杆菌亚种。婴儿减轻了抗生素对微生物群和儿童哮喘风险的影响。医学杂志4,92-112.e5(2023)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Asakuma, S. et al. Physiology of Consumption of Human Milk Oligosaccharides by Infant Gut-associated Bifidobacteria. J. Biol. Chem. 286, 34583–34592 (2011).Article
Asakuma,S.等人,《婴儿肠道相关双歧杆菌消耗母乳低聚糖的生理学》。J.Biol。化学。28634583-34592(2011)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Ojima, M. N. et al. Priority effects shape the structure of infant-type Bifidobacterium communities on human milk oligosaccharides. ISME J. 16, 2265–2279 (2022).Article
Ojima,M.N.等人,《优先效应影响母乳低聚糖对婴儿型双歧杆菌群落结构的影响》。ISME J.162265–2279(2022)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Bremges, A., Fritz, A. & McHardy, A. C. CAMITAX: Taxon labels for microbial genomes. GigaScience 9, giz154 (2020).Article
Bremges,A.,Fritz,A。&McHardy,A.C。CAMITAX:微生物基因组的分类群标签。GigaScience 9,giz154(2020)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Meyer, F. et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol. 20, 51 (2019).Article
Meyer,F.等人。用蛋白石评估分类学宏基因组分析仪。基因组生物学。20,51(2019)。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Shi, L. & Chen, B. LSHvec: a vector representation of DNA sequences using locality sensitive hashing and FastText word embeddings. in Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics 1–10 (ACM, Gainesville Florida, 2021).Franzosa, E.
Shi,L。&Chen,B。LSHvec:使用位置敏感哈希和FastText单词嵌入的DNA序列的载体表示。在第12届ACM生物信息学,计算生物学和健康信息学会议论文集1-10(ACM,佛罗里达州盖恩斯维尔,2021年)中。弗兰佐萨,E。
A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat. Microbiol 4, 293–305 (2018).Article .
A、 等。炎症性肠病中的肠道微生物组结构和代谢活性。《自然微生物学》4293-305(2018)。文章。
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).Article
Truong,D.T.等人,MetaPhlAn2用于增强宏基因组分类学分析。自然方法12902-903(2015)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
LoCascio, R. G., Desai, P., Sela, D. A., Weimer, B. & Mills, D. A. Broad conservation of milk utilization genes in Bifidobacterium longum subsp. infantis as revealed by comparative genomic hybridization. Appl Environ. Microbiol. 76, 7373–7381 (2010).Article
LoCascio,R.G.,Desai,P.,Sela,D.A.,Weimer,B。&Mills,D.A。长双歧杆菌亚种中牛奶利用基因的广泛保守性。比较基因组杂交揭示了婴儿。应用环境。微生物。767373-7381(2010)。文章
ADS
广告
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10, e65088 (2021).Article
Beghini,F.等人。将不同微生物群落的分类学,功能和菌株水平分析与生物烘焙3相结合。eLife 10,e65088(2021)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Zachariasen, T. et al. Identification of representative species-specific genes for abundance measurements. Bioinforma. Adv. 3, vbad060 (2023).Article
Zachariasen,T.等人。鉴定用于丰度测量的代表性物种特异性基因。生物信息学。Adv.3,vbad060(2023)。文章
Google Scholar
谷歌学者
Nissen, J. N. et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol. 39, 555–560 (2021).Article
Nissen,J.N.等人使用深度变分自动编码器改进了宏基因组分箱和组装。美国国家生物技术公司。39555-560(2021)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).Article
Huerta-Cepas,J。等人,《eggNOG 5.0:基于5090种生物体和2502种病毒的分层,功能和系统发育注释的直系同源资源》。。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Kanehisa, M. & Goto, S. kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28, (2000).QuantStack development team & Mamba contributers. Mamba (v.0.13.0). https://mamba.readthedocs.io (2020).Mölder, F. et al. Sustainable data analysis with snakemake. F1000Res 10, 33 (2021).Article .
Kanehisa,M。&Goto,S。kegg:京都基因与基因组百科全书。核酸研究28,(2000)。QuantStack开发团队和Mamba赞助商。。https://mamba.readthedocs.io(2020年)。Mölder,F.等人,《snakemake可持续数据分析》。F1000Res 10,33(2021)。文章。
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).Article
Chaumeil,P.-A.,Mussig,A.J.,Hugenholtz,P.&Parks,D.H。GTDB Tk v2:使用基因组分类学数据库进行记忆友好分类。生物信息学385315-5316(2022)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).Article
凯悦,D。等。浪子:原核基因识别和翻译起始位点鉴定。BMC生物信息学。11119(2010)。文章
Google Scholar
谷歌学者
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).Article
。美国国家生物技术公司。351026-1028(2017)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Vasimuddin, Md., Misra, S., Li, H. & Aluru, S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 314–324 (IEEE, Rio de Janeiro, Brazil, 2019).Li, H. et al. The sequence alignment/Map format and SAMtools.
Vasimuddin,Md.,Misra,S.,Li,H。&Aluru,S。用于多核系统的BWA-MEM的高效体系结构感知加速。。Li,H。等人。序列比对/图谱格式和SAMtools。
Bioinformatics 25, 2078–2079 (2009).Article .
生物信息学252078-2079(2009)。文章。
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evolution 30, 772–780 (2013).Article
Katoh,K。&Standley,D.M。MAFFT多序列比对软件版本7:性能和可用性的改进。分子生物学。进化30772-780(2013)。文章
CAS
中科院
Google Scholar
谷歌学者
Price, M. N., Dehal, P. S. & Arkin, A. P. Fasttree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).Article
Price,M.N.,Dehal,P.S.&Arkin,A.P.Fasttree 2–大型比对的近似最大似然树。PLoS ONE 5,e9490(2010)。文章
ADS
广告
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evolution 37, 1530–1534 (2020).Article
Minh,B.Q.等人,《IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法》。分子生物学。进化371530-1534(2020)。文章
CAS
中科院
Google Scholar
谷歌学者
Van Dongen, S. Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 30, 121–141 (2008).Article
Van Dongen,S。通过离散解耦过程进行图聚类。暹罗J.矩阵肛门。应用。30121-141(2008)。文章
MathSciNet
MathSciNet
Google Scholar
谷歌学者
Joshi N. A., Fass J. N. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files. (2011).Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Computational Biol. 19, 455–477 (2012).Article
Joshi N.A.,Fass J.N.Sickle:一种滑动窗口,自适应,基于质量的FastQ文件修剪工具。(2011年)。Bankevich,A。et al。SPAdes:一种新的基因组组装算法及其在单细胞测序中的应用。J、 计算生物学。19455-477(2012)。文章
MathSciNet
MathSciNet
CAS
中科院
Google Scholar
谷歌学者
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).Article
Kang,D.D.等人。MetaBAT 2:一种自适应分箱算法,用于从宏基因组组件中进行稳健有效的基因组重建。。文章
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Piro, V. C., Lindner, M. S. & Renard, B. Y. DUDes: a top-down taxonomic profiler for metagenomics. Bioinformatics 32, 2272–2280 (2016).Article
Piro,V.C.,Lindner,M.S。和Renard,B.Y。DUDes:宏基因组学自上而下的分类学分析器。生物信息学322272-2280(2016)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Nguyen, N., Mirarab, S., Liu, B., Pop, M. & Warnow, T. TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30, 3548–3555 (2014).Article
Nguyen,N.,Mirarab,S.,Liu,B.,Pop,M。&Warnow,T。TIPP:分类学鉴定和系统发育分析。生物信息学303548-3555(2014)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
McMurdie, P. J. & Holmes, S. phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013).Article
McMurdie,P。J。&Holmes,S。phyloseq:用于微生物组普查数据的可重复交互式分析和图形的R包。PLoS ONE 8,e61217(2013)。文章
ADS
广告
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Bisgaard, H. et al. Deep phenotyping of the unselected COPSAC 2010 birth cohort study. Clin. Exp. Allergy 43, 1384–1394 (2013).Article
Bisgaard,H.等人,《未经选择的COPSAC 2010年出生队列研究的深度表型》。临床。实验过敏431384-1394(2013)。文章
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Stokholm, J. et al. Maturation of the gut microbiome and risk of asthma in childhood. Nat. Commun. 9, 141 (2018).Article
Stokholm,J。等人。肠道微生物组的成熟和儿童哮喘的风险。国家公社。9141(2018)。文章
ADS
广告
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Li, X. et al. The infant gut resistome associates with E. coli, environmental exposures, gut microbiome maturity, and asthma-associated bacterial composition. Cell Host Microbe 29, 975–987.e4 (2021).Article
Li,X。等人。婴儿肠道抵抗组与大肠杆菌,环境暴露,肠道微生物组成熟度和哮喘相关细菌组成有关。细胞宿主微生物29975-987.e4(2021)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Moraes, T. J. et al. the canadian healthy infant longitudinal development birth cohort study: biological samples and biobanking: the child study: biological samples. Paediatr. Perinat. Epidemiol. 29, 84–92 (2015).Article
Moraes,T.J.等人,《加拿大健康婴儿纵向发育出生队列研究:生物样本和生物库:儿童研究:生物样本》。儿科。佩里纳特。流行病。29,84-92(2015)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Xu, S. et al. Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 1, (2022).Ellegaard, K. M. & Engel, P. Genomic diversity landscape of the honey bee gut microbiota. Nat. Commun. 10, 446 (2019).Article
Xu,S。等人。Ggtree:用于可视化系统发育树和注释数据的序列化数据对象。iMeta 1,(2022年)。Ellegaard,K.M。和Engel,P。蜜蜂肠道微生物群的基因组多样性景观。国家公社。10446(2019)。文章
ADS
广告
CAS
中科院
PubMed
PubMed
PubMed Central
公共医学中心
Google Scholar
谷歌学者
Sunagawa, S. et al. Ocean plankton. structure and function of the global ocean microbiome. Science 348, 6237 (2015).Article
Sunagawa,S。等人。海洋浮游生物。全球海洋微生物组的结构和功能。科学3486237(2015)。文章
Google Scholar
谷歌学者
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evolution 38, 5825–5829 (2021).Article
Cantalapiedra,C.P.,Hernández Plaza,A.,Letunic,I.,Bork,P。&Huerta Cepas,J。eggNOG mapper v2:宏基因组规模的功能注释,正畸分配和域预测。分子生物学。进化385825-5829(2021)。文章
CAS
中科院
Google Scholar
谷歌学者
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).Article
Buchfink,B.,Xie,C。&Huson,D.H。使用DIAMOND进行快速灵敏的蛋白质比对。自然方法12,59-60(2015)。文章
CAS
中科院
PubMed
PubMed
Google Scholar
谷歌学者
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal, Complex Systems1695, 1–9 (2006).Zachariasen T & Russel J. MAGinator enables accurate profiling of de novo MAGs with strain-level phylogenies. https://github.com/Russel88/MAGinator, https://doi.org/10.5281/zenodo.11485929 (2024).Download referencesAcknowledgementsWe express our deepest gratitude to the children and families of the COPSAC cohort studies for all their support and commitment.
Csardi,G。和Nepusz,T。用于复杂网络研究的igraph软件包。InterJournal,复杂系统1695,1-9(2006)。Zachariasen T&Russel J.MAGinator能够准确分析具有菌株水平系统发育的从头MAG。https://github.com/Russel88/MAGinator,https://doi.org/10.5281/zenodo.11485929(2024年)。下载参考文献致谢我们对COPSAC队列研究的儿童和家庭的所有支持和承诺表示最深切的感谢。
We acknowledge and appreciate the unique efforts of the COPSAC research team. All funding received by COPSAC is listed on www.copsac.com. The Lundbeck Foundation (Grant no R16-A1694); The Ministry of Health (Grant no 903516); Danish Council for Strategic Research (Grant no 0603-00280B) and The Capital Region Research Foundation have provided core support to the COPSAC research centre.
我们承认并赞赏COPSAC研究团队的独特努力。COPSAC收到的所有资金都列在www.COPSAC.com上。伦贝克基金会(批准号R16-A1694);卫生部(批准号903516);丹麦战略研究委员会(批准号0603-00280B)和首都地区研究基金会为COPSAC研究中心提供了核心支持。
JS has received funding from the Danish Council for Independent Research (Grant no. 8045-00081B). We thank the CHILD Cohort Study (CHILD) participant families for their dedication and commitment to advancing health research. CHILD was initially funded by CIHR and AllerGen NCE, and the metagenomic data reported here was generated with support from Genome Canada and Genome BC (274CHI).Author informationAuthors and AffiliationsDepartment of Health and Technology, Section of Bioinformatics, Technical University of Denmark, Lyngby, DenmarkTrine Zachariasen, Gisle A.
JS已获得丹麦独立研究委员会的资助(批准号8045-00081B)。我们感谢儿童队列研究(CHILD)参与者家庭对推进健康研究的奉献和承诺。CHILD最初由CIHR和过敏原NCE资助,这里报道的宏基因组数据是在加拿大基因组和BC基因组(274CHI)的支持下产生的。。
Vestergaard, Ole Lund & Asker BrejnrodDepartment of Biology, Section of Microbiology, University of Copenhagen, Copenhagen, DenmarkJakob Russel, Søren J. Sørensen & Jakob StokholmDepartment of Pediatrics, BC Children’s Hospital, University of British Columbia, 950 West 28th Avenue, Vancouver, BC, CanadaCharisse Petersen & Stuart E.
维斯特加德(Vestergaard)、奥勒·隆德(Ole Lund)和阿斯克·布雷诺德(Asker Brejnrod)哥本哈根大学微生物学系,丹麦·雅各布·拉塞尔(DenmarkJakob Russel)、Søren J.Sørensen&Jakob Stokholm不列颠哥伦比亚大学儿童医院儿科,不列颠哥伦比亚省温哥华西28大道950号,CanadaCharisse Petersen&Stuart E。
Turvey.
皮维。
PubMed Google ScholarJakob RusselView author publicationsYou can also search for this author in
PubMed Google ScholarJakob RusselView作者出版物您也可以在
PubMed Google ScholarCharisse PetersenView author publicationsYou can also search for this author in
PubMed谷歌学术评论PetersenView作者出版物您也可以在
PubMed Google ScholarGisle A. VestergaardView author publicationsYou can also search for this author in
PubMed Google ScholarGisle A.VestergaardView作者出版物您也可以在
PubMed Google ScholarShiraz ShahView author publicationsYou can also search for this author in
PubMed Google ScholarShiraz ShahView作者出版物您也可以在
PubMed Google ScholarPablo Atienza LopezView author publicationsYou can also search for this author in
PubMed Google ScholarPablo Atienza LopezView作者出版物您也可以在
PubMed Google ScholarMoschoula PassaliView author publicationsYou can also search for this author in
PubMed Google Scholarmaschoula PassaliView作者出版物您也可以在
PubMed Google ScholarStuart E. TurveyView author publicationsYou can also search for this author in
PubMed Google ScholarStuart E.TurveyView作者出版物您也可以在
PubMed Google ScholarSøren J. SørensenView author publicationsYou can also search for this author in
PubMed Google ScholarSøren J.SørensenView作者出版物您也可以在
PubMed Google ScholarOle LundView author publicationsYou can also search for this author in
PubMed Google ScholarOle LundView作者出版物您也可以在
PubMed Google ScholarJakob StokholmView author publicationsYou can also search for this author in
PubMed Google ScholarJakob StokholmView作者出版物您也可以在
PubMed Google ScholarAsker BrejnrodView author publicationsYou can also search for this author in
PubMed Google ScholarAsker BrejnrodView作者出版物您也可以在
PubMed Google ScholarJonathan ThorsenView author publicationsYou can also search for this author in
PubMed Google ScholarJonathanThorsenview作者出版物您也可以在
PubMed Google ScholarContributionsThe figures and tables were created by T.Z, P.A.L, A.B and J.T. T.Z, A.B, J.R, J.T and S.S draughted the manuscript. The MAGinator software was developed and set up by T.Z and J.R. T.Z, J.R, C.P, G.V, S.S, P.A.L, M.P, S.T, S.J.S, O.L, J.S, A.B and J.T provided intellectual input and aided in the theoretical aspects of shaping this study.
PubMed谷歌学术贡献这些数字和表格是由T.Z,P.A.L,A.B和J.T.创建的。T.Z,A.B,J.R,J.T和S.S起草了手稿。MAGinator软件由T.Z和J.R.T.Z、J.R、C.P、G.V、S.S、P.A.L、M.P、S.T、S.J.S、O.L、J.S、A.B和J.T开发和设置,提供了智力投入,并有助于形成这项研究的理论方面。
The corresponding author had full access to the data and held the final responsibility for deciding to submit the manuscript for publication. T.Z, J.R, C.P, G.V, S.S, P.A.L, M.P, S.T, S.J.S, O.L, J.S, A.B and J.T guarantee that the accuracy and integrity of any part of the work have been appropriately investigated and resolved and all have approved the final version of the manuscript.
通讯作者可以完全访问数据,并对决定提交稿件发表负有最终责任。T、 Z,J.R,C.P,G.V,S.S,P.A.L,M.P,S.T,S.J.S,O.L,J.S,A.B和J.T保证工作任何部分的准确性和完整性都得到了适当的调查和解决,并且都已经批准了稿件的最终版本。
None of the authors received any honorarium, grant, or other forms of payment for creating this manuscript.Corresponding authorCorrespondence to.
没有一位作者因撰写这份手稿而获得任何酬金,赠款或其他形式的付款。对应作者对应。
Trine Zachariasen.Ethics declarations
Trine Zachariasen。道德宣言
Competing interests
相互竞争的利益
The authors declare no competing interests.
作者声明没有利益冲突。
Peer review
同行评审
Peer review information
同行评审信息
Nature Communications thanks Stephen Nayfach and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
《自然通讯》感谢斯蒂芬·奈法赫(StephenNayfach)和另一位匿名审稿人对这项工作的同行评审所做的贡献。同行评审文件可用。
Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary informationSupplementary InformationReporting SummaryPeer Review FileSource dataSource DataRights and permissions
Additional informationPublisher的注释Springer Nature在已发布的地图和机构隶属关系中的管辖权主张方面保持中立。补充信息补充信息报告摘要同行评审文件源数据源数据权限
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
开放获取本文是根据知识共享署名4.0国际许可证授权的,该许可证允许以任何媒体或格式使用,共享,改编,分发和复制,只要您对原始作者和来源给予适当的信任,提供知识共享许可证的链接,并指出是否进行了更改。
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
本文中的图像或其他第三方材料包含在文章的知识共享许可中,除非在材料的信用额度中另有说明。如果材料未包含在文章的知识共享许可中,并且您的预期用途不受法律法规的许可或超出许可用途,则您需要直接获得版权所有者的许可。
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/..
要查看此许可证的副本,请访问http://creativecommons.org/licenses/by/4.0/..
Reprints and permissionsAbout this articleCite this articleZachariasen, T., Russel, J., Petersen, C. et al. MAGinator enables accurate profiling of de novo MAGs with strain-level phylogenies.
转载和许可本文引用本文Zachariasen,T.,Russel,J.,Petersen,C。等人。MAGinator能够准确分析具有菌株水平系统发育的从头MAG。
Nat Commun 15, 5734 (2024). https://doi.org/10.1038/s41467-024-49958-8Download citationReceived: 18 September 2023Accepted: 21 June 2024Published: 09 July 2024DOI: https://doi.org/10.1038/s41467-024-49958-8Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard.
《国家公社》155734(2024)。https://doi.org/10.1038/s41467-024-49958-8Download引文接收日期:2023年9月18日接收日期:2024年6月21日发布日期:2024年7月9日OI:https://doi.org/10.1038/s41467-024-49958-8Share本文与您共享以下链接的任何人都可以阅读此内容:获取可共享链接对不起,本文目前没有可共享的链接。复制到剪贴板。
Provided by the Springer Nature SharedIt content-sharing initiative
由Springer Nature SharedIt内容共享计划提供
CommentsBy submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
评论通过提交评论,您同意遵守我们的条款和社区指南。如果您发现有虐待行为或不符合我们的条款或准则,请将其标记为不合适。