DNA Data Storage – Top 10 Emerging Technologies

According to the World Economic Forum (WEF) report, by 2020 (only 4 months away) an estimated 1.7 megabytes of data will be created per second per person globally. That’s 418 zettabytes in a single year (we need a definition for these number growths!). Mostly created by social media, video and graphics.

The big question the Enterprise Architect needs to ask is how much does my enterprise use, need, spend on data storage, management, security and analysis with demand escalating, costs will show in business cases over the next 5 years.

As new markets in Africa and Asia data services continue to accelerate, demand will probably quadruple current requirements. Our current storage technology capabilities will run out of steam worldwide probably in the next 10-15 years. Then the world will be exhausted on storage. Massive investments will be required to manage the data, analyse, delete or store the most relevant information and data on today’s technologies whether on the simple hard drive, in data centres or warehouses, or the cloud. The world’s history and creativity will be curtailed as we run out of storage. Nothing new for us baby boomers or even generation X who were brought up to be mindful of storage capacity and costs but a shock for the millennials and post-millennials.
Is there another way forward? According to WEF the answer may lie with DNA Data Storage. Just what is DNA data storage? Make way for DNA chains which can hold all the world’s annual data in a cube of DNA measuring just a metre all ways or 4 grams of synthesized DNA depending on which research you read. Whatever techniques are used it will totally revolutionise the IT industry.

Geneticists recognised that DNA consists of long chains of nucleotides Adenine, Cytosine, Guanine and Thymine (A,C,G,T – another acronym to learn) which stores life’s information. For example, the simple bacterium Escherichia coli has a storage density of about 1019 bits per cubic centimetre, according to calculations published in 2016 in Nature Materials by George Church of Harvard University and his colleagues. Data stored using the DNA ACGT sequencing holds the potential of becoming a new form of information technology.

As a result, of recent advancements in next-generation, sequencing techniques allow for billions of DNA sequences to be read easily and simultaneously. With this ability, the use of DNA sequences as molecular identification “tags” means bar coding can be employed too. DNA bar coding is now being used to dramatically accelerate the pace of research in fields such as chemical engineering, materials science and nanotechnology. While the storage capacity potential is massive it does not require much energy to store and the content is very stable too.

Making DNA work in operational environments still has many challenges including high cost metrics for usage and RAM qualities, but these are being resolved by researchers. Organisations such as Microsoft, DARPA, Universities of Washington, Illinois, Columbia, Duke, Harvard and Padua are all researching ways to resolve these challenges in order to unleash the benefits that DNA storage technology brings to the world.

Enterprise Architect’s need to keep their fingers on this pulse as it means we can collect all the data we want in DNA Centres and store it forever. We can even put it all in a cupboard in Head Office provided we can access and process the data. Availability will be key to usage – archived data is fine but usability long-term too is essential if data is to be of value and as DNA storage lasts at least 500,000 years we should not have many issues!

According to a Veritas study of 1,500 data managers across 15 countries, 52% of all data is unclassified or untagged – most is considered dark data hidden in ‘databergs’ that are not available or visible to their owners. Unless organisations change their behaviour toward information management, dark data is set to grow dramatically leaving enterprises’ open to security and governance issues. Managers demand many views of the data to create analytics information needed to develop their business insights and many hold their information locally. How could their usage change with DNA Centres? No doubt there will be cloud services that hold our DNA Centres – just like today’s data centres? Or will they become Information Centres only? Will we need super large virtual processors to manage the DNA Centre rather than today’s level of processing services? Will we need our centres outsourced to non-polluted environments? Will we be able to collect all those databergs and put them in the DNA storage? Can we store all the data lakes of raw unstructured and processed data as well as the structured services in the DNA Centre?

The impact of data storage trends will be that in the short-term we will need to continue expanding our conventional storage and services and in the long-term plan to collapse it.

What benefits will DNA Data Storage bring to the business or will they not notice? As Companies are increasingly becoming aware of and measured against their environmental policies, one major benefit will be the reduction of the carbon footprint by substantially reducing on-site data storage, getting rid of all those archived tapes and discs, reducing the square footage of Data Centres, and reducing the consequential pollution that they bring. No need to dispose of plastic and metal components either apart from the existing assets – as the data is all wrapped up in a biological block for 500,000 years. Carbon footprint reduction is a major driver for data storage change.

The Enterprise Architects need to rethink the Data Centres and Data Lakes, Cloud services, and our software systems – as all will be impacted. See Figure 1. Plus we will need new Business Intelligence services to optimise the potential value streams that data growth provides. And as the data volumes triple, quadruple etc then data governance and administration will be a major headache for the business. Just how will we manage this growing capability in our enterprise?

Now is the time to think and plan new investments in Digital and Business data skills and analytics as processing the data will be a critical success factor for the enterprise’s future.– given the colossal potential to disrupt the IT industry it may actually come a lot faster as new entrepreneurs jump into the market. Raising the question, who will we trust with our DNA data? What are the timescales? The industry thinks it will be widespread by mid 2020’s?

We need to think hard about this and start to raise awareness of change with senior business managers. Consideration needs to be put into the Enterprise’s Five-year Business Plan and it’s time to plan for an EA Visioning workshop on the DNA Data Storage Emerging Technology. My advice – put something in the budgets to cover the opportunity exploration because it will arrive a lot faster than you think. Wishing you every success!

Kind Regards

Judith

Read Judith Jones’s Biography Here

For more background information on DNA data storage see:

Recent Posts