商务合作
动脉网APP
可切换为仅中文
NEW YORK – Researchers in Switzerland have published a study benchmarking computational tools for analyzing single-cell ATAC-seq (assay for transposase-accessible chromatin by sequencing) data that have become available over the last five years, providing users with advice for selecting the best method for their particular application..
纽约——瑞士研究人员发布了一项研究基准计算工具,用于分析过去五年中可用的单细胞ATAC-seq(转座酶可及染色质测序分析)数据,为用户选择适合其特定应用的最佳方法提供了建议。。
The team, comprising researchers from ETH Zurich and the University of Zurich, ran several datasets through eight 'feature engineering pipelines' derived from five different methods in order to discover and discriminate cell types.
该团队由苏黎世理工学院(ETH Zurich)和苏黎世大学(University of Zurich)的研究人员组成,通过来自五种不同方法的八个“特征工程管道”运行了多个数据集,以发现和区分细胞类型。
'Our analysis provides guidelines for choosing analysis methods for different datasets,' the authors wrote in a paper published last month in Genome Biology, noting that SnapATAC and SnapATAC2 —bioinformatics packages developed in Bing Ren's lab at the University of California, San Diego — generally outperformed other methods, especially for datasets with 'complex cell-type structures.'.
“我们的分析为选择不同数据集的分析方法提供了指导,”作者在上个月发表在《基因组生物学》上的一篇论文中写道,并指出SnapATAC和SnapATAC2(在加利福尼亚大学圣地亚哥分校Bing Ren实验室开发的生物信息学软件包)通常优于其他方法,尤其是对于具有“复杂细胞类型结构”的数据集。
However, 'we wouldn't say that [SnapATAC2] is the universally best choice,' the authors said in a statement provided to GenomeWeb. 'It is not the most memory-efficient method despite using on-disk storage, a similar strategy as in ArchR,' one of the other packages evaluated in the study. 'Another thing that is not the focus of our benchmark but can be relevant to users is that, as a toolkit package, SnapATAC2 is not as comprehensive as ArchR and Signac, because it doesn’t include downstream functionalities like motif analysis or co-accessibility analysis.'.
然而,作者在向GenomeWeb提供的一份声明中说,“我们不会说(SnapATAC2)是普遍最佳的选择。”尽管使用了磁盘存储,但它并不是内存效率最高的方法,这与研究中评估的其他软件包之一ArchR中的策略类似。“另一件不是我们基准测试的重点,但可能与用户相关的事情是,作为工具包,SnapATAC2不如ArchR和Signac全面,因为它不包括motif分析或共可访问性分析等下游功能。”。
'I think the paper is solid and does a good job assessing the tools in several contexts and leads to the conclusion that is already pretty obvious, that you generally want to use different tools for different datasets, and it is worth assessing several,' said Andrew Adey, a single-cell sequencing expert at Oregon Health & Science University who has used SnapATAC2, among other tools..
。。
The study provides a fresher look at single-cell ATAC-seq data analysis options than a 2019 benchmarking study, also published in Genome Biology, especially with the inclusion of SnapATAC2, which was published in January of this year. ArchR, from William Greenleaf's Stanford University lab that pioneered single-cell ATAC-seq, and Signac, from single-cell sequencing bioinformatics maven Rahul Satija of New York University and the New York Genome Center, were both introduced in 2021..
这项研究提供了比2019年基准研究更新鲜的单细胞ATAC-seq数据分析选项,2019年基准研究也发表在《基因组生物学》上,特别是纳入了今年1月发表的SnapATAC2。来自威廉姆·格林利夫(WilliamGreenleaf)斯坦福大学实验室的ArchR是单细胞ATAC-seq的先驱,来自纽约大学单细胞测序生物信息学maven Rahul Satija和纽约基因组中心的Signac均于2021年推出。。
Moreover, the study offers a focused look at single-cell ATAC-seq data analysis. 'While the single-cell transcriptomics field has matured, and to some degree converged, methodologically, for single-cell chromatin assays, there remains a lot of major unknowns,' the Swiss team said. 'In particular, there are critical ways in which scATAC-seq data differs from single-cell RNA-seq and prevent a direct application of methods developed for the latter.'.
此外,该研究重点关注单细胞ATAC-seq数据分析。”瑞士研究小组说,虽然单细胞转录组学领域已经成熟,并且在方法学上在某种程度上融合了单细胞染色质测定,但仍有许多主要未知因素特别是,scATAC-seq数据与单细胞RNA-seq有一些关键的不同之处,并阻止了直接应用为后者开发的方法。”。
Unlike single-cell transcriptomics, which is able to count expressed genes, 'features are not defined a priori for ATAC-seq, and typical analyses rely either on tiling over the whole genome or calling peaks (i.e., candidate regulatory elements) from the data itself, both of which come with their own issues and limitations,' the study authors noted.
研究作者指出,与能够计算表达基因的单细胞转录组学不同,“ATAC-seq的特征并不是先验定义的,典型的分析要么依赖于整个基因组的平铺,要么依赖于数据本身的调用峰(即候选调控元件),两者都有其自身的问题和局限性。”。
.
.
Even within one package, such as Signac, there can be multiple options with little information available to users on which ones to choose. In addition to feature aggregation and SnapATAC, the study evaluated both peak-calling and tiling options in ArchR, 'all cell peaks' and 'by cluster peaks' options in Signac, and SnapATAC2's 'cosine' and 'jaccard' options..
即使在一个包(例如Signac)中,也可以有多个选项,而用户可以选择的信息很少。除了功能聚合和SnapATAC之外,该研究还评估了ArchR中的峰值调用和平铺选项,Signac中的“所有单元格峰值”和“按簇峰值”选项,以及SnapATAC2的“余弦”和“jaccard”选项。。
The study ran six public datasets through each pipeline. The data came from the original ArchR publication, a single-cell chromatin accessibility atlas published in 2021 by Ren's lab, a study from Greenleaf's lab on human hematopoietic cell differentiation, a dataset of peripheral blood mononuclear cells provided by 10x Genomics, and a 2019 study of single-cell gene expression and chromatin accessibility published in Nature Biotechnology..
该研究通过每条管道运行了六个公共数据集。数据来自原始的ArchR出版物,Ren的实验室于2021年发布的单细胞染色质可及性图谱,Greenleaf的人类造血细胞分化实验室的研究,10x Genomics提供的外周血单核细胞数据集,以及2019年发表在Nature Biotechnology上的单细胞基因表达和染色质可及性研究。。
'We were surprised to observe that the way features are defined (e.g., peaks versus genome tiles, overall peak calling versus per-cluster) is not as critical as we expected,' the authors told GenomeWeb. However, the number of features used '[made] a major difference' and even explained some of the differences between the methods.
作者告诉GenomeWeb:“我们惊讶地发现,定义特征的方式(例如,峰与基因组分片,整体峰调用与每个簇)并不像我们预期的那样关键。”。然而,使用的功能数量“产生了重大差异”,甚至解释了这些方法之间的一些差异。
'For example, ArchR performance improves when using more features than by default, and SnapATAC did not perform as good with lower-than-default numbers of features,' they said..
他们说:“例如,当使用比默认情况更多的功能时,ArchR的性能会提高,而SnapATAC在使用少于默认数量的功能时表现不佳。”。。
In addition to the various packages, which all select a subset of 'features,' the study used an 'aggregation method [which] clusters correlated features and then sums them up into meta-features,' Siyuan Luo, a doctoral candidate at the University of Zurich and the first author of the paper, told GenomeWeb.
苏黎世大学博士生、论文第一作者罗思远(SiyuanLuo)告诉GenomeWeb,除了选择“特征”子集的各种软件包外,该研究还使用了一种“聚合方法,将相关特征聚类,然后将其汇总为元特征”。
'This has the advantage of using all the information (albeit in a less-specific form), and of being easier to properly normalize. Then standard methods can be used downstream. … We included it out of curiosity and were rather surprised by its good performance. But at the moment, it's chiefly a proof of concept of the aggregation strategy.'.
“这样做的好处是可以使用所有信息(尽管形式不太具体),并且更容易正确规范化。然后可以在下游使用标准方法…出于好奇,我们将其包括在内,并对其良好的表现感到相当惊讶。。
While the study presented both simple and complex datasets, as defined by the cell types they contained, Adey said he would like to see how the methods perform on noisy datasets. 'Lots of tissues we work with generate noisy data, and it is all we can get, regardless of the technology used,' he said.
虽然这项研究提供了简单和复杂的数据集,这些数据集是由它们所包含的细胞类型定义的,但Adey说,他想看看这些方法在嘈杂的数据集上表现如何他说,我们使用的许多组织都会产生嘈杂的数据,无论使用何种技术,这都是我们所能获得的。
'Some of these tools (the ones with iterative clustering) will generate beautiful looking clusters, but half of them are a mix of cells from very different cell types. We have found running the iterative ones with only one iteration is best, but then they perform more comparably to other methods.'.
“这些工具中的一些(具有迭代聚类的工具)将生成外观漂亮的聚类,但其中一半是来自非常不同细胞类型的细胞的混合物。我们发现只使用一次迭代来运行迭代方法是最好的,但是它们的性能比其他方法更具可比性。”。
Identifying rare populations or highly related subpopulations within complex tissues is still challenging, the authors noted, 'due to the data sparsity and low signal-to-noise ratio — none of these methods are always performing perfectly in our benchmark.'
作者指出,在复杂组织中识别稀有种群或高度相关的亚群仍然具有挑战性,“由于数据稀疏性和低信噪比,这些方法在我们的基准测试中都没有完美的表现。”
Identifying the regulatory elements that define the identity of rare subpopulations is also a remaining challenge. 'While scATAC-seq identifies regions of open chromatin, linking these regions to their functional roles, such as controlling the expression of specific genes, remains complex,' they said.
确定定义稀有亚群身份的监管因素也是一个剩余的挑战。”。
.
.