DNA Data Storage: A Solution Looking for a Problem?
Many moons ago, famed venture capitalist Marc Andreessen described the term “product-market fit” as “being in a good market with a product that can satisfy that market.” To prove that you have product-market fit, just sell a meaningful amount of your product and service to end customers – no subsidies, no free trials counted as revenues, no grants, and no related party revenues. Whenever an exciting new technology promises the moon, we always want to see product-market fit before investing. We don’t invest in stories, we invest in traction. That’s why when Twist Biosciences (TWST) promises their investors that DNA data storage is around the corner, we wanted to take a closer look at the thesis.
DNA Data Storage Explained
All data is made up of 1s and 0s. All DNA is made up of As, Ts, Cs, and Gs. Put that way, it would make sense that if the 1s and 0s can be converted to ATCGs, then data could be stored on strands of DNA. So, is it possible? Yes. However, it’s not exactly that simple.
To store data on DNA, the desired information first needs to be mapped onto the four nucleotides (the As, Ts, Cs, and Gs) of DNA. This is done using a DNA synthesizer. While this synthesizer might not help Paul McCartney create a pop Christmas hit, it’s entirely vital to the data conversion process. Its primary purpose is to join nucleotides in a specific order to build DNA molecules. In short, it creates a “synthesized” strand of DNA. The nucleotides are chemically modified to allow them to fit snugly and easily onto the DNA chain. Once this is complete, the DNA can be “purified” (separated from other cellular components) making it ready for data storage. This is where companies like Twist Bioscience fit in. Twist specializes in the DNA synthesis process, helping to make DNA storage a reality.
Some Key Competitors
While several companies are actively making strides in the DNA data storage space, Twist Bioscience boasts the ability to store up to 215 petabytes of data in a single gram of DNA. For perspective, a single petabyte could store over 13 years of HD video footage. This means Twist could store 28 centuries of HD video in one gram of DNA. The latest from the company is that they’re working on a Proof of Concept (POC) ship and a device that allows data to be written to DNA.
Early access is expected to happen before the end of this year, though it’s unclear how mature the product offering will be by then.
Founded in 2016, Catalog Technologies boasts investors such as Baidu, a Chinese technology conglomerate, and Horizons Ventures, a Hong Kong-based VC firm focusing on disruptive technology startups. Catalog secured $35 million in Series B funding in late 2021, bringing their total funding to $55 million. While their platform “only” has the capacity to store one terabyte (1,000 gigabytes) of data per gram of DNA, Catalog’s advantage is that they have built a highly scalable and secure platform for data storage. Essentially, they are taking a different approach in focusing on practicality as opposed to capacity. They have not announced a timeline for commercial availability of their DNA data storage offering and their website doesn’t offer too much confidence, claiming “the idea of storing information using DNA has been around for a while. It’s just that the cost of DNA synthesis has been a bottle neck.”
Surprised to see this one on the list? We are too. Nonetheless, Microsoft is also at the forefront of the DNA data storage industry, with a goal of making it more practical and accessible for a wide range of users and applications. Their main contribution to the field is the development of a fully automated system for DNA synthesis and coding, in partnership with Twist and Illumina among others. Dubbed the DNA Synthesis Platform (no creativity points here), it can encode large amounts of data in a single run. Microsoft is also developing a specialized algorithm for encoding and decoding DNA data, which follows a linear process detailed below.
While these companies talk about progress being made towards commercialization, one company claims to have already achieved commercialization.
In April of last year, global biotechnology group GenScript (1548.HK) announced the “first commercial platform for DNA digital data storage” which was validated by researchers who encoded “100 Mb of mixed data types in the DNA to test the High-Density DNA Synthesis Chip’s storage utility and retrieved 100% of the encoded data with no loss.” The press release announcing the product offering said that it would be “fully commercially available in the second half of 2022” and that they’re “actively seeking participants for a partnership program to expand the applicability of the data storage platform.” No mention was made of whether this solution can compete on cost. If it can’t, then it’s hard to see why anyone would want to engage with their partnership program.
Competing on Cost
DNA data storage isn’t addressing a blue ocean total addressable market (TAM). It needs to displace existing solutions, namely magnetic tape storage, which is one of the oldest technologies available for electronic data storage. While other storage technologies like solid-state drives are becoming popular for primary storage and data processing, many organizations continue to use tape storage products as part of a larger data management strategy.
Magnetic tapes have been providing a high-capacity, low-cost solution for archival data storage since computers were first invented. The first magnetic tape storage system was invented in the late 1940s by Jack Mullin, an engineer from Santa Clara University, after observing the Nazi radio broadcasts of World War II and realizing they had a device capable of recording and reproducing sound. As a successful electrical engineer, Mullin recognized the potential of this technology and used it to create the foundation of the magnetic tape storage we use to this day.
An article by NetworkWorld talks about why tape storage isn’t dead citing cost as a primary factor.
Tape allows you to separate the medium from the recording device, which allows you to buy a handful of tape drives and thousands of tapes. Those thousands of tapes also do not need power and cooling to maintain their data. In fact, some have suggested that even if disk was free it would still cost more than tape due to the power and cooling savings alone.
That last sentence is a very powerful one, and it should also apply to DNA data storage which doesn’t need power or cooling either (so we presume). Additionally, many don’t realize that tapes actually have lower error rates when writing data than any other recording medium. “Tape is 1000X better at writing ones and zeros than the best disk drive,” says the article, and 10X better than solid-state drives. But when it comes to retrieval, the speed of discs just can’t be beat.
Tapes continue to be the dominant solution for secondary storage use cases where data needs to be held for long periods of time for as cheaply as possible. The article cites entertainment and biotech as two industries that are buying huge tape libraries and filling them with thousands of tapes. This meshes well with the clientele Twist Bioscience seems to be targeting with their offering.
Opportunities for Disruption
So, what’s the advantage of using DNA for data storage? While tape storage is the current standard for archival data storage, it’s not without its weaknesses. Catalog Technologies points out that conventional storage media “do not have the longevity, data density, or cost efficiency to meet the global demand.” Since magnetic tape can only stay intact for 30 years, under optimal conditions, DNA has potential to disrupt it in the archival storage market, as the high density and long lifespan of DNA makes it optimal for this purpose.
Additionally, retrieval speed is an issue for tape storage. Tape drives have slower read and write speeds than other storage devices, with high-end speeds of only several hundred megabytes (MB) per second. Retrieval times may also be affected by outside factors such as including network bandwidth, data compression, and the overall complexity of the data being retrieved. This makes tape ideal only for data that does not require frequent access or urgent retrieval, such as archiving. In the high-performance computing market, where the ability to store and retrieve large amounts of data efficiently is crucial, DNA may be able to get a leg up if it can find a way to shorten its retrieval process, but at what cost? DNA data storage needs to compete with tapes on cost, then with traditional hard discs on speed. Given Genscript is talking about researchers storing and retrieving 100 megabytes successfully, just how far away are we from scalable economically viable DNA data storage solutions? Or is this just a cool technology solution looking for a problem to solve?
It’s About TAM Time
Calculating TAM for DNA data storage can be found by calculating the size of the market it’s looking to disrupt, starting with the archival data market. Twist claims archival data storage is a $35 billion market, however the global tape storage market was estimated at $4.3 billion in 2019, and according to Allied Market Research, is only expected to reach $9.4 billion by 2030. If Twist expects to disrupt solid-state and hard disc drives, maybe they should start with the low hanging fruit. Obvious bottlenecks include cost, speed, and complexity. If DNA storage fails to become competitive with tape storage, there will be virtually no TAM because no one in his right mind would spend more money to store data in a more complex way. Total cost of ownership needs to include:
- The cost of writing the data to a DNA storage medium (should be intuitively expensive)
- The cost of storing the DNA storage medium (should be zero)
- The cost of reading the data (isn’t this just the cost of sequencing)?
These days, it costs $600 to sequence a complete genome which contains around 200 gigabytes of data or about $3 per gig. Today, magnetic tape technology offers the lowest purchase price of raw storage capacity at around two cents per gigabyte, says Forbes. Sure, these are rough numbers, but we’re not convinced DNA data storage can even come close to competing on cost anytime soon. A key component of the entire process, the DNA data writer, will be critically important in determining how cost effective the entire process will be, and Twist expects to allow early access to this product offering later this year.
Analysts should press the company regarding how far away these pilot products are from being cost competitive. DNA data storage is only a portion of Twist’s offerings, but they tout it in their investor decks as the next biggest thing. The only way we’ll know they’ve developed an economically viable product is when the revenues start to pour in.
On one hand, Genscript claims to have a commercially available product for DNA storage. On the other hand, they talk about the need to find partners to advance their technology. If it’s economically viable, partners should be finding them. Being able to store data on DNA might make for a great party trick, but we’re not convinced that it’s an economically viable solution that will replace tape drives which seem to be managing just fine. Until these companies start providing speed and cost metrics that prove they can compete with traditional data storage methods, we’re not convinced they’ve built anything that’s economically viable for data storage needs.
Tech investing is extremely risky. Minimize your risk with our stock research, investment tools, and portfolios, and find out which tech stocks you should avoid. Become a Nanalyze Premium member and find out today!