DNA Data Storage Moves Beyond Moore’s Law

This article first appeared on Discover Science Magazine by Nathaniel Scharping

Over the past few decades, it has become apparent that Moore’s Law has started to come apart. The 1965 observation, named after Gordon E. Moore, stated that the number of components on a chip seemed to double every year, but we are reaching the limit of silicon’s storage capabilities.

To keep pushing the boundaries of computing technology, we’ll need to rethink the basic components of computers themselves. And the field of DNA storage could offer a solution to a problem growing ever more apparent in our digital world: Where do we store billions of gigabytes of data that make up the Internet?

“A large part of building better computers is about finding better materials to build computers with,” says Luis Ceze, an associate professor in the Computer Science Department at the University of Washington. “So, silicon happens to be a fantastic material, but it’s reaching a point where it’s unclear that we can continue pushing forward with silicon. So I find it fascinating that biology has evolved many molecules that are useful for building better computers in the future.”

Beyond Silicon

Current archival facilities, such as the data storage center Facebook recently built in Oregon, occupy entire warehouses and can store about an exabyte — 1 billion gigabytes of data — at a maximum. That’s just a fraction of the entire internet, which is forecast to reach 16 zettabytes, or 16,000 exabytes, by 2017.

By encoding information using DNA, the blueprint for life on Earth, researchers say that they could take all of that information and fit it in your living room. By taking bits of information and translating them from the 1s and 0s on a computer chip into the four letters of DNA, scientists can create strands of DNA that encode for anything you like, from a Taylor Swift song to the Library of Congress.

To accomplish this, researchers build an index that links the four nucleotides that make up DNA (A,T,C and G) to the strings of 1s and 0s we already use on our computers. A DNA synthesizer creates short strands of DNA that each hold a part of a file’s code. Once all of the information has been converted to DNA, the information can be stored and retrieved by a DNA sequencer that reads combinations of nucleotides.

A Better Way to Encode DNA

Ceze is part of a team of researchers at the University of Washington that has developed a new method of encoding and reading information stored in synthetic DNA. They looked to a widely used audio compression tool called the Huffman code, which is a way to express strings of binary code in a shorter way.

He says that their method allows for even greater storage capacity by reducing redundancies — the process of making multiple identical strands to account for errors — and allows individual pieces of the data to be read without sequencing all of the DNA stored, something that had not previously been done. The method includes unique “primers” in individual strands of DNA that can be targeted during the sequencing process to highlight a particular strand. They say that this improves functionality of their system, eliminating the need to sequence the entire database just to read a single strand.

As a proof-of-concept, the team encoded the information for several image files in synthetic DNA and successfully sequenced the strands to redraw the pictures. While they only encoded several megabytes of information, Ceze says that the process could be scaled up to hold much larger databases.

“If we compare flash to DNA in terms of density, or the number of bits in a certain volume, DNA will be at least a billion times denser. You can put an exabyte in a cubic inch, which would be a few sugar cubes,” says Ceze.

Ceze emphasizes that synthesizing DNA to store data is not related to genetic engineering. Instead of attempting to put together the right strands of DNA to create an organism, their method is entirely synthetic.

DNA Computers

Storing data in strand of DNA has one significant drawback: it’s slow. Unlike computer chips, which communicate at nearly the speed of light using electrons, DNA data storage relies on physically moving molecules around.

For this reason, we shouldn’t expect to see DNA hard drives at your local computer store in the near future, Ceze says. Instead, he envisions using DNA data storage to preserve massive data archives, such as those used by Facebook and cloud storage services, where speed is not as crucial. The technology also remains expensive. But, even compared to five years ago, prices have dropped precipitously, according to Ceze. He’s looking forward to further reductions in the cost of synthesizing and sequencing DNA, which would heighten the feasibility of DNA data banks.

“Computers were pretty expensive a while ago, and then they got cheaper because there was a demand for them that dropped the price. So now that DNA storage is creating even more demand [for DNA synthesis and sequencing] beyond the biomedical industry, that will push the price down,” says Ceze.