How can we distinguish between true zero and dropout-zero counts in single-cell RNA-seq?

Question

In single-cell RNA-seq data we have an inflated number of 0 (or near-zero) counts due to low mRNA capture rate and other inefficiencies.

How can we decide which genes are 0 due to gene dropout (lack of measurement sensitivity), and which are genuinely not expressed in the cell?

Deeper sequencing does not solve this problem as shown on the below saturation curve of 10x Chromium data:

Also see Hicks et al. (2017) for a discussion of the problem:

Zero can arise in two ways:

the gene was not expressing any RNA (referred to as structural zeros) or

the RNA in the cell was not detected due to limitations of current experimental protocols (referred to as dropouts)

Possible duplicate of What methods are available to find a cutoff value for non-expressed genes in RNA-seq? — gringer, Jun 13 '17 at 13:06
I don't think this is a duplicate: this question is about distinguishing between zero counts and the linked question is about distinguishing between zero and non-zero counts. — Peter, Jun 13 '17 at 13:08
Definitely not a duplicate, this is a very different question. — Devon Ryan, Jun 13 '17 at 13:09
Sorry, I missed the "single cell" mention. Despite that, I expect that some answers are likely to overlap, even if the problem is different. See, for example, Kristoffer's answer.... er, which may have been posted in the wrong question. — gringer, Jun 13 '17 at 13:15
hey @Peter if the answers here are not satisfactory you could maybe add more info to the question or comment on the answers — galicae, Jun 28 '17 at 08:11

score 15 · Accepted Answer · answered Jun 13 '17 at 16:03

Actually this is one of the main problems you have when analyzing scRNA-seq data, and there is no established method for dealing with this. Different (dedicated) algorithms deal with it in different ways, but mostly you rely on how good the error modelling of your software is (a great read is the review by Wagner, Regev & Yosef, esp. the section on "False negatives and overamplification"). There are a couple of options:

You can impute values, i.e. fill in the gaps on technical zeros. CIDR and scImpute do it directly. MAGIC and ZIFA project cells into a lower-dimensional space and use their similarity there to decide how to fill in the blanks.
Some people straight up exclude genes that are expressed in very low numbers. I can't give you citations off the top of my head, but many trajectory inference algorithms like monocle2 and SLICER have heuristics to choose informative genes for their analysis.
If the method you use for analysis doesn't model gene expression explicitly but uses some other distance method to quantify similarity between cells (like cosine distance, euclidean distance, correlation), then the noise introduced by dropout can be covered by the signal of genes that are highly expressed. Note that this is dangerous, as genes that are highly expressed are not necessarily informative.
ERCC spike ins can help you reduce technical noise, but I am not familiar with the Chromium protocol so maybe it doesn't apply there (?)

since we are speaking about noise, you might consider using a protocol with unique molecular identifiers. They remove the amplification errors almost completely, at least for the transcripts that you capture...

EDIT: Also, I would highly recommend using something more advanced than PCA to do the analysis. Software like the above-mentioned Monocle or destiny is easy to operate and increases the power of your analysis considerably.

score 1 · Answer 2 · edited May 12 '19 at 11:00

1

Some people use imputation to differentiate between true zeros and dropout in single-cell data. Some approaches you can look into:

CIDR which was published in Genome Biology, here
scImpute , published in Nature, here
Seurat (addImputedScore) , i.e. the famous R package, also here
SAVER which was reviewed in Frontiers of Genetics, here

edited May 12 '19 at 11:00

M__

12,263
5
28
47

answered Jun 13 '17 at 15:21

burger

2,179
10
21

How can we distinguish between true zero and dropout-zero counts in single-cell RNA-seq?

2 Answers2

Linked