18

After some google searches, I found multiple tools with overlapping functionality for viewing, merging, pileuping, etc. I have not got time to try these tools, so will just see if anyone already know the answer: what is the difference between them? Performance? Features? Or something else? Which one is generally preferred? Samtools?

conchoecia
  • 3,141
  • 2
  • 16
  • 40
medbe
  • 847
  • 1
  • 7
  • 9
  • 1
    This question is likely to inspire an opinionated debate about the merits (or otherwise) of different tools, which is discouraged on StackExchange. It would be helpful if a single question were asked, preferably one that tries to stay away from people's preferences. – gringer Jun 03 '17 at 01:53
  • In the "something else" you could add the quality of the documentation, which may be an important factor coming into play when deciding which tool to adopt. – bli Jun 03 '17 at 11:40

1 Answers1

16

The obvious answer is that different people wrote them. It's fairly common in bioinformatics for people with a computer science background to get frustrated with existing tools and create their own alternative tool (rather than improving an existing tool). Over time, tools with similar initial aims will have popular functionality implemented in them (and eventually have bugs fixed), such that it matters less which particular tool is used for common methods.

Here's my impression of the tools:

  1. samtools -- originally written by Heng Li (who also wrote BWA). The people who now work on samtools also maintain the alignment file format specification for SAM, BAM, and CRAM, so any new file format features are likely to be implemented in samtools first.

  2. bamtools -- this looks like it was written by Derek Barnett, Erik Garrison, Gabor Marth, Michael Stromberg to mirror the samtools toolkit, but using C++ instead of C

  3. picard -- Java tools written by the Broad Institute for manipulating BAM/SAM files. Being written in Java makes it easier to port to other operating systems, so it may work better on Windows systems. I'm more familiar with picard being used at a filtering level (e.g. removing PCR duplicates), and for statistical analysis, but it links in with the Java HTS library from samtools, so probably shares a lot of the functionality.

  4. sambamba -- a GPL2-licensed toolkit written in the D programming language (presumably by Artem Tarasov and Pjotr Prins). I haven't used it (and don't know people who have used it), but the github page suggests "For almost 5 years the main advantage over samtools was parallelized BAM reading. Finally in March 2017 samtools 1.4 was released, reaching parity on this."

  5. biobambam -- written by German Tischler in C++. I also have no experience with this toolkit. This seems to have some multithreading capability, but is otherwise similar to other toolkits.

gringer
  • 14,012
  • 5
  • 23
  • 79
  • 1
    A comparison of sorting speed between SAMtools (version 1.2) and sambamba (version 0.6.3) here. – user5359531 Feb 20 '18 at 18:59
  • In light of older versions of Samtools being slower than Sambamba, sometimes you also have to consider the needs of the overall pipeline. For example, some older software requires old versions of Samtools to run, which may make it hard to gain the speed advantages from newer Samtools and lead to choosing to use a different program entirely instead of having to support different versions of the same tool in your pipeline. – user5359531 Feb 20 '18 at 19:01