商务合作
动脉网APP
可切换为仅中文
NEW YORK – The latest version of Illumina's Dragen genome analysis software is poised to take pangenome-based analysis mainstream, improving the ability to call all variant types, according to a new study published by the company in collaboration with researchers at Baylor College of Medicine.
纽约——根据该公司与贝勒医学院研究人员合作发布的一项新研究,最新版本的Illumina Dragen基因组分析软件有望成为基于泛基因组的分析主流,从而提高调用所有变异类型的能力。
Benchmarked against leading pipelines for calling SNPs and indels — the Broad Institute's Genome Analysis Tool Kit (GATK) and Google's DeepVariant — Dragen outperformed the other approaches in analyzing the Genome in a Bottle (GIAB) sample HG002.
Dragen以领先的SNP和插入缺失(Broad Institute的基因组分析工具包(GATK)和谷歌的DeepVariant)为基准,在分析瓶中基因组(GIAB)样本HG002方面优于其他方法。
Specifically, Dragen posted F-scores, a combined measure of precision and recall performance, of 99.86 percent, with 2,553 false positives and 8,610 false negatives. DeepVariant plus the Giraffe mapping tool posted an F-score of 99.64 percent with 3,695 false positives and 24,090 false negatives, while GATK plus the Burrow-Wheeler Aligner (BWA) had an F-score of 99.13 percent with 38,622 false positives and 29,163 false negatives.
具体来说,Dragen发布的F分数(精确性和召回性能的综合衡量标准)为99.86%,其中2553个假阳性和8610个假阴性。DeepVariant plus长颈鹿映射工具的F分数为99.64%,其中3695个假阳性和24090个假阴性,而GATK plus Burrow-Wheeler Aligner(BWA)的F分数为99.13%,其中38622个假阳性和29163个假阴性。
.
.
Moreover, Dragen was able to call other types of variants, including short tandem repeats (STR), structural variants (SV), and copy number variants (CNV). For insertions larger than 50 bp, Dragen achieved an F-score of 76,9 percent, compared to 34.9 percent for Manta, an SV caller for short reads developed by Illumina.
此外,Dragen能够调用其他类型的变体,包括短串联重复序列(STR),结构变体(SV)和拷贝数变体(CNV)。对于大于50 bp的插入,Dragen的F分数为76.9%,而Manta的F分数为34.9%,Manta是Illumina开发的用于短读的SV调用者。
And for CNVs between 1 Kb and 10 Kb, Dragen performed better than short-read CNV-analyzer CNVnator, though performance for larger CNVs was more similar. .
对于1 Kb至10 Kb之间的CNV,Dragen的性能优于短读CNV analyzer CNVnator,尽管较大CNV的性能更相似。
'I'm a big fan of comprehensive genomics,' said Fritz Sedlazeck, a bioinformatician at BCM and a senior author of the paper, published last week in Nature Biotechnology. 'This is an important milestone in bringing STR, SV, and CNV calling to a broader audience and to scale it in population studies or trios to enhance our understanding of these regions in diseases and different phenotypes.'.
BCM的生物信息学家、该论文的资深作者弗里茨·塞德拉泽克(FritzSedlazeck)上周在《自然生物技术》(Nature Biotechnology)上发表了这篇论文,他说:“我是综合基因组学的忠实粉丝。”这是一个重要的里程碑,将STR,SV和CNV呼叫带给更广泛的受众,并在人口研究或三重奏中进行扩展,以增强我们对这些疾病和不同表型区域的理解。”。
The study also presented analysis results for over 3,200 samples from the UK-based 1,000 Genomes Project, where Dragen identified 116.3 million SNVs and 25 million indels. Performance on known SNVs and common indels was comparable to GATK; however, Dragen found millions more singletons and rare indels.
该研究还提供了来自英国1000个基因组计划的3200多个样本的分析结果,Dragen在该项目中鉴定出1.163亿个SNV和2500万个indel。已知SNV和常见indel的性能与GATK相当;然而,Dragen发现了数百万个单身和罕见的indel。
.
.
'This is a nice demonstration of the pangenome and a preview of things to come,' said Michael Schatz, a bioinformatician at Johns Hopkins University, who was not involved with the study. 'Suddenly, everything gets better,' he said.
约翰霍普金斯大学(JohnsHopkins University)的生物信息学家迈克尔·沙茨(MichaelSchatz)没有参与这项研究,他说:“这是对泛基因组的一个很好的演示,也是对未来事物的一个预览。”他说,突然间,一切都变好了。
'I don't see it as a major threat to long reads, but it does help close that gap,' he said, noting that the HG002 genome 'is not a whole-genome benchmark,' given that it focuses more on high-confidence regions and 'leaves out some of the trickier parts' of the human genome that harbor repeats and SVs..
他说,我不认为这是对长读的主要威胁,但它确实有助于缩小这一差距,并指出HG002基因组“不是全基因组基准”,因为它更多地关注高可信度区域,并“遗漏了人类基因组中一些包含重复序列和SV的棘手部分”。。
Sedlazeck, a former postdoc in Schatz’s lab, disclosed that Illumina provided computing credits for the study and that he has received funding from sequencing competitors Oxford Nanopore Technologies and Pacific Biosciences.
沙茨实验室的前博士后塞德拉泽克(Sedlazeck)透露,Illumina为这项研究提供了计算学分,他还获得了测序竞争对手牛津纳米孔技术公司(Oxford Nanopore Technologies)和太平洋生物科学公司(Pacific Biosciences)的资助。
The study made use of Illumina's Dragen version 4.2, released in 2023 and updated to v4.3 in June of this year, a pipeline that is available for onboard computing with some Illumina instruments including the NovaSeq X Series and the NextSeq 1000 and 2000 systems. Illumina acquired the Dragen platform in 2018 when it bought Edico Genome.
这项研究使用了Illumina于2023年发布并于今年6月更新到v4.3的Dragen版本4.2,该管道可用于使用一些Illumina仪器(包括NovaSeq X系列和NextSeq 1000和2000系统)进行车载计算。Illumina于2018年收购Edico Genome时收购了Dragen平台。
The concept makes use of specialized hardware to speed up analysis, called field-programmable gate arrays (FPGA)..
该概念利用专门的硬件来加速分析,称为现场可编程门阵列(FPGA)。。
'Similar to how a graphics processing unit (GPU) can accelerate the numerical processing for machine learning, an FPGA is much more efficient for data intensive parallel computing than a standard CPU, allowing them to cut the runtime by manyfold,' Schatz said. Dragen was able to identify all the variants from raw data at 30X coverage in only 30 minutes, he noted, compared to about 24 hours for GATK on a server..
Schatz说:“类似于图形处理单元(GPU)如何加速机器学习的数字处理,FPGA对于数据密集型并行计算的效率要比标准CPU高得多,从而可以将运行时间缩短许多倍。”。他指出,Dragen能够在30分钟内从30倍覆盖率的原始数据中识别出所有变体,而GATK在服务器上大约需要24小时。。
However, the reliance on FPGAs could also limit adoption. They're not commonly available on computing servers, Schatz said, and while they are available on the cloud, 'not everyone will be willing or able to use cloud computing for their research.'
然而,对FPGA的依赖也可能限制采用。Schatz说,它们在计算服务器上并不常见,虽然它们在云上可用,但“并不是每个人都愿意或能够将云计算用于他们的研究。”
Illumina also requires users to obtain a license to run Dragen. Sedlazeck said that as part of this study, Illumina agreed to make Dragen available to academic institutions under a special license. In an email, an Illumina spokesperson said the firm will offer a free trial license to Dragen that allows academic researchers to process 2,500 GB of sequencing data to reproduce results from the publication and to demo the software on their own projects.
Illumina还要求用户获得运行Dragen的许可证。Sedlazeck说,作为这项研究的一部分,Illumina同意在特殊许可下向学术机构提供Dragen。Illumina发言人在一封电子邮件中表示,该公司将向Dragen提供免费试用许可证,允许学术研究人员处理2500 GB的测序数据,以复制出版物的结果,并在自己的项目上演示该软件。
After that, labs would need to purchase a license. A 30X human genome could be processed for approximately $8, including license and cloud fees, when accessed through Illumina's managed cloud, she said. .
之后,实验室需要购买许可证。当通过Illumina的托管云访问时,一个30倍的人类基因组可以处理大约8美元,包括许可证和云费用,她说。
In 2019, Illumina and the Broad Institute partnered to integrate GATK with Dragen. In July, a Broad blog post suggested it was still working on an official release of a 'unified Dragen-GATK pipeline.' However, Illumina said that 'new features that were added after [Dragen version] 3.7.8 will not be integrated into external tools such as GATK.'.
2019年,Illumina和Broad Institute合作将GATK与Dragen整合。7月,一篇博文显示,该公司仍在致力于“统一Dragen GATK管道”的正式发布然而,Illumina表示,“[Dragen版本]3.7.8之后添加的新功能将不会集成到GATK等外部工具中。”。
Earlier this month, Broad Clinical Labs announced its whole-genome sequencing-based laboratory developed tests, which use Dragen for analysis, had been approved by the New York State Clinical Laboratory Evaluation Program.
本月早些时候,Broad Clinical Labs宣布其基于全基因组测序的实验室开发的测试已获得纽约州临床实验室评估计划的批准,该测试使用Dragen进行分析。
Using Dragen 'makes enough of a difference to justify it taking over,' Sedlazeck said. 'Our center is using it more and more across different studies and experiments.'
塞德拉泽克说,使用Dragen“足以证明其接管的合理性。”我们的中心在不同的研究和实验中越来越多地使用它。”
Both Dragen and DeepVariant use pangenomes 'in their best-performing workflows,' said Benedict Paten, a computational biologist at the University of California, Santa Cruz and a leader in the Human Pangenome Reference Consortium. 'With the second release of the pangenome now forthcoming, we can expect further improvements in these widely used tools.'.
加州大学圣克鲁斯分校的计算生物学家、人类泛基因组参考联盟的领导人本尼迪克特·佩恩(BenedictPaten)说,Dragen和DeepVariant都在“他们表现最佳的工作流程”中使用泛基因组随着pangenome的第二个版本即将发布,我们可以期待这些广泛使用的工具得到进一步改进。”。
For most users, the shift to pangenome-based methods will be invisible. Already, hundreds of thousands of genomes are being analyzed with Dragen through the UK Biobank and All of Us projects. But for anyone who was cautiously uncertain about its use, 'this will be another stamp of approval,' Schatz said.
对于大多数用户来说,转向基于pangenome的方法将是不可见的。通过英国生物库和我们所有的项目,Dragen已经在分析数十万个基因组。但沙茨说,对于任何对其使用持谨慎不确定态度的人来说,这将是另一个认可的标志。
.
.