In recent years, we have seen ourselves getting dependent on digital data to a greater extent. People have realized the importance of personal information, digital wallet information, passwords, bank details, and digital keys as they need to be stored and secured very carefully.
Earlier, there were traditional storage units such as cassettes, CDR, floppy, micro SD cards with limited storage capacity. Now there is “cloud” as one of the renowned and trusted storage methods that people are already using for a considerable time. The traditional data storage units can be lost, damaged, or corrupted, and we cannot store data for a longer period.
Some know and some do not, there is a new wave in the world of data storage.
“Think of compressing all the information on the accessible Internet into a shoebox,” says Karin Strauss, a Principal Researcher at Microsoft. “With DNA data storage, that’s possible.”
And for your question on how digital data is stored in DNA, there is a simple explanation. DNA is a heredity unit for all living organisms on earth. It depicts inheriting characters from parents to their generation, so it is like a memory storage device of all living organisms. The Soviet Scientist, Mikhail Neiman, published a paper regarding the use of DNA as a digital data storage unit.
“An organic molecule used for storing digital data is called an organic data storage. This may be DNA, RNA, or protein. A gram of DNA can store 1 exabyte of data for as long as 2,000 years.”
Working of DNA digital data
The complete working of DNA digital data storage comprises four steps. Digital data is encoded in a DNA sequence, the corresponding sequence information synthesized into an artificial DNA, and information is decoded by sequencing the artificial DNA strand. The complete path of storing and retrieving digital data from DNA can be laid down in steps.
DNA has four nitrogenous bases, namely, Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). Regarding storing data into the DNA, the A, T, C, and G bases of DNA are converted into binary codes 1 and 0. 00 for A, 01 for G, 10 for C, and 11 for T for storing information. The information which is in the binary form is converted into a sequence of A, T, C, and G.
Artificial DNA synthesis
The single-stranded arbitrary is chemically synthesized. In the case of digital sequence data, every nucleotide is added to the adjacent nucleotide. Even if the efficiency of artificial DNA synthesis is 99%, 1% of error can still cause a major problem in digital data storage.
“The computational approaches facilitated by working with DNA make it an even more attractive option for data storage,” adds Luis Ceze, Professor of Computer Science and Engineering at the University of Washington. “We have the freedom to choose how to map bits to DNA sequences, creating redundancy and high tolerance to error when reading and writing DNA.
Storing of sample
This step comprises storing our data backup in the form of a liquid drop of several nanograms of DNA. It can be deep freezed so that it lasts for 100 years or can be sent to external storage systems, which can help store data for a thousand years. Although DNAs can last for millions of years, some can be lost in just a small period of time.
Sequencing of DNA
To understand the digital data in its original form, it is required to sequence the entire DNA. DNA sequencing is a process in which a DNA sequence is read into the digital sequence. And this process is repeated multiple times with different start sites and it gives multiple parallel sequences of the DNA. As a result, the sequence which matches with DNA is picked and forwarded to the decoder.
Lastly, the information goes to the decoder, where it is decoded back into the binary language. It further results in retrieving the data.
Is it the future of data storage?
Everyone—data experts and professionals—working on data have seen data storage advancing as each year passes. A massive amount of data is produced and consumed, thus, increasing the need for dependable and plentiful storage devices. The researchers Yaniv Erlich and Dina Zielinski could only see the storage limits getting pushed via DNA sequencing technology. They found it accommodating a large amount of data for 100 years.
Microsoft is planning to store data on DNA within the next three years. The storage solution will be based on chips, and there is also a possibility that future versions of storage could involve bacteria or enzymes to do the calculations.
“Automation was, and is, one of our biggest challenges. It was great to have our first proof of concept converting information from bits, to DNA, and back to bits to prove that it was possible and also show what are our other challenges in automation, but some of the biotechnology aspects are quite new to some of us, so we’ve also been learning a lot there. The other significant challenges are continuing to increase throughput and decrease the cost of DNA sequencing and synthesis. There’s quite a bit of engineering left to get [us] to where we need to be.” Strauss added.