The results obtained by running the results
command from DESeq2 contain a "baseMean" column, which I assume is the mean across samples of the normalized counts for a given gene.
How can I access the normalized counts proper?
I tried the following (continuing with the example used here):
> dds <- DESeqDataSetFromMatrix(countData = counts_data, colData = col_data, design = ~ geno_treat)
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> res <- results(dds, contrast=c("geno_treat", "prg1_HS30", "prg1_RT"))
Here is what I have for the first gene:
> res["WBGene00000001",]$baseMean
[1] 181.7862
> mean(assays(dds)$mu["WBGene00000001",])
[1] 231.4634
> mean(assays(dds)$counts["WBGene00000001",])
[1] 232.0833
assays(dds)$counts
corresponds to the raw counts. assays(dds)$mu
seems to be a transformation of these counts approximately preserving their mean, but this mean is very different from the "baseMean" value, so these are likely not the normalized values.
mean(assays(dds)$mu["WBGene00000001",] / sizeFactors(dds))
gives something very close tores["WBGene00000001",]$baseMean
(181.9139). The vignette says thatassays(dds)$mu
correspond to s_j * q_ij where, according to the DESeq2 paper, s_j is the size factor for library j and q_ij is, if I understand correctly, an estimation of a quantity "proportional to the concentration of cDNA fragments from the gene in the sample". Is this q_ij what you call "base factor" ? – bli May 24 '17 at 07:44group
factor (the "base level") becomes the intercept, so the baseMean is then just the average of its counts. The base level is the alphanumerically first level in R by default, if you havegrp = factor(c("WT", "MUT"))
then theMUT
group is used to calculate the baseMean (assuming no expanded model matrices). – Devon Ryan May 24 '17 at 09:52mean(counts(dds, normalized=T)["WBGene00000001",])
), is actually equal to the baseMean (res["WBGene00000001",]$baseMean
). This is not the case if I compute the mean only for a given level of thegeno_treat
factor. – bli May 24 '17 at 12:13