This question is a follow up from How do kmer counters determine which kmer is 'canonical'?.
In that question we learned that kmer counting programs use a 2-bit hash function to internally represent canonical kmers as they are being counted.
Now I am wondering, how can we implement such a function in C/C++ or in python? More arbitrarily, how could I calculate the canonical kmer hash value using a mathematical function?
For example, how would we transform the 3-mer GAT
or the 21-mer GAATACCATAGGATA
to 1s and 0s such that:
hash(GAT) == hash(ATC)
hash(GAATACCATAGGATA) == hash(TATCCTATGGTATTC)