Most Popular

1500 questions
19
votes
1 answer

Is it possible for coronavirus or SARS to be synthetic?

I have heard several conspiracy theories regarding the origin of the new coronavirus, 2019-nCov. For example that the virus and/or SARS were produced in a laboratory or were some variant of Middle Eastern respiratory syndrome (MERS), shipped via…
Sscheme
  • 303
  • 1
  • 6
18
votes
7 answers

How to convert FASTA to BED

I have a FASTA file: > Sequence_1 GCAATGCAAGGAAGTGATGGCGGAAATAGCGTTAGATGTATGTGTAGCGGTCCC... > Sequence_2 GCAATGCAAGGAAGTGATGGCGGAAATAGCGTTAGATGTATGTGTAGCGGTCCC.... .... I want to generate a BED file for each sequence like: Sequence_1 0…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
18
votes
2 answers

How can we distinguish between true zero and dropout-zero counts in single-cell RNA-seq?

In single-cell RNA-seq data we have an inflated number of 0 (or near-zero) counts due to low mRNA capture rate and other inefficiencies. How can we decide which genes are 0 due to gene dropout (lack of measurement sensitivity), and which are…
Peter
  • 2,634
  • 15
  • 33
18
votes
1 answer

What is the difference between samtools, bamtools, picard, sambamba and biobambam?

After some google searches, I found multiple tools with overlapping functionality for viewing, merging, pileuping, etc. I have not got time to try these tools, so will just see if anyone already know the answer: what is the difference between them?…
medbe
  • 847
  • 1
  • 7
  • 9
18
votes
1 answer

How can I improve a long-read assembly with a repetitive genome?

I'm currently trying to assembly a genome from a rodent parasite, Nippostrongylus brasiliensis. This genome does have an existing reference genome, but it is highly fragmented. Here are some continuity statistics for the scaffolds of the current…
gringer
  • 14,012
  • 5
  • 23
  • 79
17
votes
4 answers

How to compute RPKM in R?

I have the following data of fragment counts for each gene in 16 samples: > str(expression) 'data.frame': 42412 obs. of 16 variables: $ sample1 : int 4555 49 122 351 53 27 1 0 0 2513 ... $ sample2 : int 2991 51 55 94 49 10 55 0 0 978 ... $…
Iakov Davydov
  • 2,695
  • 1
  • 13
  • 34
17
votes
4 answers

Why Bioconductor?

What are the advantages of having Bioconductor, for the bioinformatics community? I've read the 'About' section and skimmed the paper, but still cannot really answer this. I understand Bioconductor is released twice a year (unlike R), but if I want…
Peter
  • 2,634
  • 15
  • 33
17
votes
3 answers

BAM to BigWig without intermediary BedGraph

I have a pipeline for generating a BigWig file from a BAM file: BAM -> BedGraph -> BigWig Which uses bedtools genomecov for the BAM -> BedGraph part and bedGraphToBigWig for the BedGraph -> BigWig part. The use of bedGraphToBigWig to create the…
17
votes
6 answers

What's the best way to download data from the SRA? Is it really this slow?

I'm trying to download three WGS datasets from the SRA that are each between 60 and 100GB in size. So far I've tried: Fetching the .sra files directly from NCBI's ftp site Fetching the .sra files directly using the aspera command line (ascp) Using…
tfenne
  • 171
  • 1
  • 4
17
votes
3 answers

Convert a BAM file from one reference to another?

I have a set of BAM files that are aligned using the NCBI GRCh37 human genome reference (with the chromosome names as NC_000001.10) but I want to analyze it using a BED file that has the UCSC hg19 chromosome names (e.g. chr1). I want to use bedtools…
morgantaschuk
  • 530
  • 4
  • 9
17
votes
12 answers

How to convert fasta file to tab delimited file

I have a fasta file like >sample 1 gene 1 atgc >sample 1 gene 2 atgc >sample 2 gene 1 atgc I want to get the following output, with one break between the header and the sequence. >sample 1 gene 1 atgc >sample 1 gene 2 atgc >sample 2 gene 1 …
AudileF
  • 955
  • 8
  • 25
17
votes
5 answers

Is there an easy way to create a summary of a VCF file (v4.1) with structural variations?

I got a bunch of vcf files (v4.1) with structural variations of bunch of non-model organisms (i.e. there are no known variants). I found there are quite a some tools to manipulate vcf files like VCFtools, R package vcfR or python library PyVCF.…
Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
17
votes
7 answers

How to convert BED to GFF3

I would like to convert a BED format to GFF3. The only useful tool that I could find via a google search seems to be Galaxy, and I do not feel very comfortable with online tools, plus the webserver is currenlty under maintenance. Does anyone knows…
aechchiki
  • 2,676
  • 11
  • 34
16
votes
2 answers

Finding the location and unit length of repetitive sequences within a long read

After discovering a few difficulties with genome assembly, I've taken an interest in finding and categorising repetitive DNA sequences, such as this one from Nippostrongylus brasiliensis [each base is colour-coded as A: green; C: blue; G: yellow; T:…
gringer
  • 14,012
  • 5
  • 23
  • 79
16
votes
3 answers

R package development: How does one automatically install Bioconductor packages upon package installation?

I have an R package on github which uses multiple Bioconductor dependencies, 'myPackage' If I include CRAN packages in the DESCRIPTION via Depends:, the packages will automatically install upon installation via devtools, i.e.…
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20