As the world produces humongous amount of data — there is now about 10 trillion gigabytes of digital data on earth — scientists have shown that a technique for labeling and retrieving DNA data files from a large pool could help make DNA data storage feasible in the near future.
Every day, humans produce emails, photos, tweets and other digital files that add up to another 2.5 million gigabytes of data.
Much of this data is stored in enormous facilities known as exabyte data centers (an exabyte is 1 billion gigabytes), which can be the size of several football fields and cost around $1 billion to build and maintain.
Many scientists believe that an alternative solution lies in the molecule that contains our genetic information: DNA, which evolved to store massive quantities of information at very high density.
A coffee mug full of DNA could theoretically store all of the world’s data, says Mark Bathe, a professor of biological engineering at the Massachusetts Institute of Technology (MIT).
“We need new solutions for storing these massive amounts of data that the world is accumulating, especially the archival data,” Bathe said in a paper appeared in the journal Nature Materials.
DNA is a thousandfold denser than even flash memory, and another property that’s interesting is that once you make the DNA polymer, it doesn’t consume any energy.
“You can write the DNA and then store it forever.”
Scientists have already demonstrated that they can encode images and pages of text as DNA.
Bathe and his colleagues have now demonstrated one way to do that, by encapsulating each data file into a 6-micrometer particle of silica, which is labeled with short DNA sequences that reveal the contents.
Using this approach, the researchers demonstrated that they could accurately pull out individual images stored as DNA sequences from a set of 20 images.
Given the number of possible labels that could be used, this approach could scale up to 1020 files.
DNA has several other features that make it desirable as a storage medium: It is extremely stable, and it is fairly easy (but expensive) to synthesize and sequence.
Also, because of its high density — each nucleotide, equivalent to up to two bits, is about 1 cubic nanometer — an exabyte of data stored as DNA could fit in the palm of your hand.
One obstacle to this kind of data storage is the cost of synthesizing such large amounts of DNA. Currently it would cost $1 trillion to write one petabyte of data (1 million gigabytes).
Bathe estimated that the cost of DNA synthesis would need to drop by about six orders of magnitude.
According to him, this will happen within a decade or two, similar to how the cost of storing information on flash drives has dropped dramatically over the past couple of decades.
“While it may be a while before DNA is viable as a data storage medium, there already exists a pressing need today for low-cost, massive storage solutions for preexisting DNA and RNA samples from Covid-19 testing, human genomic sequencing, and other areas of genomics,” Bathe noted.