One night a few years ago, two biologists sat in a bar in Hamburg, discussing DNA. Ewan Birney, the associate director of the European Bioinformatics Institute, and Nick Goldman, a research scientist there, were wondering how to handle the tsunami of data flooding the institute, whose job it is to maintain databases of DNA sequences, protein structures, and other biological information that scientists turn up in their research—databases that are growing exponentially, thanks mostly to dropping costs and increased automation. The maintenance of all this data on hard drives was pressing their budget to the breaking point.
Being genomicists, they joked that DNA, which is incredibly compact, sturdy, and of course has a rather lengthy history of storing data, would be a better way to go. Joking, however, gave way to fevered napkin-scribbling, and soon, recalls Goldman, “We had to order another beer, and call for more napkins to write on.”
Three years later, the results of that bar stool inspiration have been published in Nature, in a paper in which Birney, Goldman and their collaborators report using DNA to store a complete set of Shakespeare’s sonnets, a PDF of the first paper to describe DNA’s double helix structure, a 26-second mp3 clip from Martin Luther King, Jr.’s “I Have a Dream” speech, a text file of a compression algorithm, and a JPEG photograph of the institute. You may not be storing your personal data on DNA anytime soon—the process is time-consuming and expensive, and there’s the small matter of needing a DNA sequencer to open the files—but as the costs of making and sequencing DNA continue to plunge and as computer engineering approaches the limits of just how densely information can be encoded on silicon, such biological data storage be just what’s needed for institutes and other organizations with massive archival needs.
(MORE: What’s Holding Energy Tech Back? The Infernal Battery)
To encode files in DNA, Birney and Goldman started by converting text, image, or audio data into binary code. Then, in several steps using software that Goldman wrote, they converted that into A, T, G, or C code, which stand for the four DNA bases. Working from that string of letters, they drew up the blueprints for thousands of pieces of DNA , each containing a snippet of a file, and sent their designs to Agilent Technologies, which manufactures custom DNA for biologists. Agilent sent back the completed DNA fragments—just a smidge of white dust in the bottom of a plastic tube, Goldman recalls. To open the files, the team used a standard DNA sequencer, a process that took about 2 weeks. They then used Goldman’s software to reassemble the sequenced DNA into coherent, readable files. With the exception of two small gaps in the DNA, the sonnets, photo, speech, PDF, and text file re-emerged from the white dust almost completely unscathed. After the scientists performed a little repair work, all of the information—about 739 KB worth—was retrieved with 100% accuracy.
The fidelity is impressive, and DNA, when kept in a cold, dry, dark place, can stay intact for thousands of years. But how long would you have to want to store something for this process to be cheaper than using archival magnetic tape, which needs to be replaced every 5 years but is still the current gold standard, thanks to its low power demands compared to hard drives or other storage technologies? Birney and Goldman calculate that if you wanted to put a file in storage today and have it last for at least 600 years, DNA would be cheaper than re-recording the data to fresh magnetic tape every half-decade or so, a process that would have to be repeated 120 times over the six-century span.
(MORE: The Internet of Things: Hardware With a Side of Software)
Goldman speculates that if the price of making and sequencing DNA continues to fall at current rates, commercial services that store data in DNA might spring up around 50 years from now. “You would email documents and photographs and stuff that were valuable to you and your family [to the DNA storage company], and maybe a day later or a week later, they would ship you back a little bit of DNA,” says Goldman. “You could stick it in the fridge or bury it in the garden or they would store it. And they can guarantee it will be there a hundred thousand years later.”
Birney and Goldman are not the only genomicists who have realized the data-storage potential of DNA. In September 2012, genomicists George Church, Yuan Gao, and Sriram Kosuri published a short description of a similar system in Science. The Nature team stored slightly more data, and Goldman avoided one of the sources of error in the Science paper—strings of repeated bases that DNA sequencers have trouble handling—by adjusting the way his software converts the information into A, T, G, and C. But on the whole, the ideas are similar, and represent a big step forward from earlier, smaller studies.
(MORE: Today in Time Tech History)
Still, Kosuri is quick to point out that this technology is in its infancy. “Both of our papers are pretty naïve and simplistic, in the way we encode information,” he says. “We’re not bringing to bear the 30 years or so of electrical engineering that have gone into making CDs. We’re biologists, not electrical engineers.” And even if the technique gets faster and cheaper, DNA has two limitations: it’s not rewritable, so you couldn’t update information without redoing the whole process, and doesn’t allow random access, so you couldn’t read, for instance, a single Shakespeare sonnet from the 154 Birney and Goldman stored without decoding the entire file.
No matter what, the need for something new to replace our current data-storage technology is pressing. In 2011, according to the Digital Universe report, humanity had created 1.8 zetabytes of new data to store, around 1.8 trillion gigabytes. By 2020, the number is set to have grown 50 times over. And as of this year, Moore’s Law, the observation that the number of transistors on integrated circuits doubles every two years, is expected to apply no more. Doubling is projected to occur every three years from here on out as more and more circuits compete for a fixed amount of space. Silicon may have been the workhorse of the first, golden age of computers. But it may take something even better—the very stuff that makes up life—to get us to the second.