I have a fasta file like

>sample 1 gene 1
>sample 1 gene 2
>sample 2 gene 1 

I want to get the following output, with one break between the header and the sequence.

>sample 1 gene 1   atgc
>sample 1 gene 2   atgc
>sample 2 gene 1   atgc
If you have multi-line fasta files, as is very common, you can use these scripts1 to convert between fasta and tbl (sequence_name <TAB> sequence) format:

  • FastaToTbl

    #!/usr/bin/awk -f
            if (substr($1,1,1)==">")
        		if (NR>1)
                        	printf "\n%s\t", substr($0,2,length($0)-1)
        			printf "%s\t", substr($0,2,length($0)-1)
                        printf "%s", $0
    }END{printf "\n"}
  • TblToFasta

    #! /usr/bin/awk -f
      ls = length(sequence)
      is = 1
      fld  = 1
      while (fld < NF)
         if (fld == 1){printf ">"}
         printf "%s " , $fld
         if (fld == NF-1)
            printf "\n"
          fld = fld+1
      while (is <= ls)
        printf "%s\n", substr(sequence,is,60)

Save those in your $PATH, make them executable, and you can then do:

$ cat file.fa
$ FastaToTbl file.fa 

And, to get the Fasta back:

$ FastaToTbl file.fa | TblToFasta

This can be a very useful trick when searching a fasta file for a string:

TblToFasta file.fa | grep 'foo' | FastaToTbl

If you really want to keep the leading > of the header (which doesn't seem very useful), you could do something like this:

$ perl -0pe 's/\n//g; s/.>/\n>/g; s/$/\n/;' file.fa 

But that will read the entire file into memory. If that's an issue, add an empty line between each fasta record, and then use perl's paragraph mode to process each "paragraph" (sequence) at a time:

perl -pe  's/>/\n>/' file.fa | perl -00pe 's/\n//g; s/.>/\n>/g; s/$/\n/;'

1Credit to Josep Abril who wrote these scripts more than a decade ago.

There is a very simple BioPython solution, that is minimal, readable, and handles multi-line fasta:

from Bio import SeqIO

for record in SeqIO.parse('example.fa', 'fasta'):
    print('>{}\t{}'.format(record.description, record.seq))
assuming there is only one sequence line per record, use paste with two 'stdin'

cat your.fasta | paste - -
    Note that this will fail if you have multi line sequences (as Pierre pointed out), but also if you have any blank lines in the file. You might also want to remove the UuOC: paste - - < file.fa. – terdon Oct 17 '17 at 12:24

You can use these commands:

perl -pe 's/>(.*)/>\1\t/g; s/\n//g; s/>/\n>/g' sequences.fa | grep -v '^$'


  1. Append a tab to every header line
  2. Join all lines
  3. Split the single obtained line by the '>' character
  4. Remove the empty line (the first line is empty due to the fact that '>' is the first character of the FASTA file)
Karel Břinda
A very useful tool for this kind of data manipulation is bioawk:

$ bioawk -c fastx '{print ">"$name" "$comment"\t"$seq}' test.fa
>sample 1 gene 1    atgc
>sample 1 gene 2    atgc
>sample 2 gene 1    atgc

bioawk is based on awk, with added parsing capabilities. Here, we tell that the format is fasta or fastq with -c fastx, and this makes the $name (between ">" and the first blank character), $comment (after the first blank character) and $seq (the sequence, in one line) variables available within awk instructions.

See for instance this answer for another use case.

Where possible, I recommend using a dedicated parsing library, rather than hacking a parser together: as you can see in the other answers, parsing even simple formats gets complex pretty quickly if you value correctness.

Here’s a small R script that does what we need, using ‘seqinr’:

#!/usr/bin/env Rscript
parsed = read.fasta(file('stdin'), as.string = TRUE)
table = data.frame(unlist(parsed), row.names = sapply(parsed, attr, 'Annot'))
write.table(table, stdout(), sep = '\t', quote = FALSE, col.names = FALSE)

Save it as fasta-to-tsv, make it executable, and use it as follows:

fasta-to-tsv < input.fasta > output.tsv

Equivalent code of similar length can be written in Python or Perl.

Konrad Rudolph
This could be easily done by seqkit fx2tab

seqkit fx2tab seq.fa

However, seqkit will not print the "greater than" symbol (">"). If you do need the symbol:

seqkit fx2tab seq.fa | sed 's/^/>/g'
Forrest Vigor
Just suggested improvements to the awk scripts in @terdon's answer, using any POSIX awk:

$ cat FastaToTbl
#!/usr/bin/env bash

awk -v OFS='\t' ' { if ( /^>/ ) { out = (NR>1 ? ORS : "") substr($0,2) OFS } else { out = $0 } printf "%s", out } END { print "" } ' "${@:--}"

$ cat TblToFasta
#!/usr/bin/env bash

awk -F'\t' ' { gsub(/.{60}/,"&"ORS,$2) sub(ORS"$","",$2) print ">" $1 ORS $2 } ' "${@:--}"

$ ./FastaToTbl file.fa

$ ./FastaToTbl file.fa | ./TblToFasta

FastaToTbl will actually work in any awk but TblToFasta requires a POSIX awk for support of regexp intervals like {60}. If you have a very old, pre-POSIX awk that doesn't support regexp intervals then get a new awk but if that's impossible for some reason then change TblToFasta to the following and then it'll also work in any awk:

$ cat TblToFasta
#!/usr/bin/env bash

awk -F'\t' ' BEGIN { dots = sprintf("%"60"s","") gsub(/ /,".",dots) } { gsub(dots,"&"ORS,$2) sub(ORS"$","",$2) print ">" $1 ORS $2 } ' "${@:--}"

Ed Morton
Remove empty records (description without sequence):

awk '$2{print RS}$2' FS='\n' RS=\> ORS= f1.fa > f2.fa

Remove blank lines:

sed '/^$/d' f2.fa > f3.fa

Convert multi-line fasta to single-line fasta:

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' f3.fa > f4.fa

Finally, @Pierre solution:

cat f4.fa | paste - - > f.txt
In cases where there is no sequence wrapping and each sequence occupies only a single line, the following shell command is probably going to be fastest, easiest, and most convenient.

paste - - < your.fasta > your.new.fasta
Daniel Standage
This is an old post I have noticed and there are many offered solutions. Since it’s a frequently asked question, I thought it’s worth for me to mention that there is an overlooked tool set which contains a stand-alone program called faToTab in addition to many other useful bioinformatics tools.

faToTab inputFile.fasta outFileFasta_tab.txt

It’s a gold-chest in my opinion. Here are the links to utilities folder and details: Description and Download instructions - Binaries by machine type - Link to the github page.

Anaconda installation is:

conda install -c bioconda ucsc-fatotab
conda install -c bioconda/label/cf201901 ucsc-fatotab
In python I would do:

#...Suppose you have header information and 
#...sequences stored in lists

header_info1 = [elements] header_info2 = [elements] sequences = [sequences] index = enumerate(sequences) table = open(pathtoyourfile.tsv,'w+') for h1,h2,s,i in zip(header_info1,header_info2,sequences,index): table.write(f">{h1}\t{h2}\t{s}\n") if i+1==len(sequences): table.write(f">{h1}\t{h2}\t{s}") table.close()

So basically I use f strings and I iterate over these three vectors that are of the same length. At the end of the iteration, I remove new line(\n) since it won't be needed anymore since you have not to write anything further.

Spartan 117
