SCCNAInfer: a robust and accurate tool to infer the absolute copy number on scDNA-seq data

Zhang, Liting; Zhou, Xin Maizie; Mallory, Xian. “SCCNAInfer: a robust and accurate tool to infer the absolute copy number on scDNA-seq data.Bioinformatics, Volume 40, Issue 7, July 2024, btae454, https://doi.org/10.1093/bioinformatics/btae454. 

In diseases like cancer, changes in our cells called copy number alterations (CNAs) are important to understand. These changes can tell us a lot about how diseases progress. Single-cell DNA sequencing (scDNA-seq) helps researchers detect CNAs in individual cells, but current tools can make mistakes across the entire genome due to wrong estimates of cell chromosome numbers, or “ploidy.” 

SCCNAInfer is a new tool designed to improve this process. It uses information from inside tumor cells to more accurately estimate each cell’s ploidy and CNAs. SCCNAInfer works alongside existing CNA detection methods by grouping cells, calculating ploidy for each group, refining the data, and accurately identifying CNAs for each cell. 

Tests show that SCCNAInfer does a better job compared to other tools like Aneufinder, Ginkgo, SCOPE, and SeCNV. This new tool can help researchers get clearer insights into cell changes, aiding in the study of cancer and other diseases. 

SCCNAInfer is freely available at https://github.com/compbio-mallory/SCCNAInfer. 

Overview of SCCNAInfer. Raw read count and optionally the segmentation of each cell from an existing tool are the input to SCCNAInfer. If the segmentation result is not provided, SCCNAInfer allows the users to select a state-of-the-art method to produce the segmentation result. Step 1 identifies the normal cells if any, and normalizes the raw read count. Step 2 calculates the pairwise distance among each pair of cells based on the normalized read count and the segmentation result from an existing tool. Given the pairwise distance among the cells, Step 3 clusters the cells by a hierarchical clustering approach which automatically selects the optimal cluster number. Here, K refers to the number of clusters, and E refers to the cost function. Whichever K minimizes E is selected. Step 4 searches the optimal subclonal ploidy (P) for each cluster. For each cluster, whichever P that can minimize a cost function F is selected. Step 5 refines the read count by clustering the bins inside each cell cluster. Finally, based on the corrected read count from Step 5 and the optimal subclonal ploidy from step 4, the absolute copy number for each cell is calculated as the output of SCCNAInfer.