Using specialized erasure-correcting algorithm fountain codes, the researchers demonstrate how to pack 1.6 bits into each nucleotide. A DNA strand contains four base nucleotides that can each hold up to 1.8 bits. To demonstrate, the researchers encoded six files and stored them in DNA. They encoded a 1948 scientific study, an 1895 French film, an entire computer operating system, an Amazon gift card, a Pioneer plaque, and a computer virus. These files were compressed into a master file. The data inside was split into short strings of binary code, represented by ones and zeros. Utilizing the fountain codes, the researchers packaged the strings of data into what’s called droplets. The ones and zeros could then be mapped in each droplet, corresponding to the four nucleotide bases (A, G, C and T). The algorithm adds a barcode to each droplet for easy identification and effectively deletes letter combinations that cause errors.
The two researchers sent the text file to Twist Bioscience, a group that specializes in converting digital information to biological data. The encoded information was compressed into 72,000 DNA strands that are 200 bases long. The final product was a vial that contained a speck of DNA. Those molecules held all the encoded information that could now be retrieved using modern sequencing technology and software that translates genetic code back to binary. The storage and retrieval was error-free. (RELATED: For more scientific discoveries, visit Scientific.News.)
Taking it a step further, they demonstrated how to use their coding technique to copy the files indefinitely by multiplying the DNA sample through a polymerase chain reaction. This is a breakthrough, considering that DNA can last hundreds of thousands of years.
This method of information storage is now the highest-density data storage method ever created. With this new technique, one gram of DNA can hold 215 petabytes of data. Previous attempts of DNA data storage at the European Bioinformatics Institute were successful but they packed 100 times less information in the DNA and there were errors when the information was retrieved.
The only hurdle to this method is the cost. Synthesizing DNA is not cheap, and neither is reading it. The researchers spent at least $9,000 to store and read the information. They are optimistic that lower-quality molecules can be produced to bring down the cost. They also believe time intense molecular coding can be sped up using computer techniques such as algorithm fountain codes.
Sources include: