To Tier or not to Tier, is no longer the Question

Posted by Bill Tolson • December 13, 2017

MS_Archive_blog.jpgMicrosoft today announced the general availability of their archive tier, Microsoft Azure Archive Blob Storage, to go along with their Hot and Cool storage tiers. For Azure-based archiving and information governance applications, the Azure Archive Blob Storage tier will be a huge advance for records managers and information governance professionals looking for long term, inexpensive archive storage.

Data deletion is not happening

Companies are retaining more data for much longer periods of time. This is due to changing regulatory compliance requirements, lengthening legal cycles, and mostly because end-users don’t have the time to manually read and manage the huge variety, volume and velocity of data they come into contact on a daily basis.  Another factor is the lack of software applications that can understand and predictively categorize all of this data with the required accuracy rates.

The question for CIOs, GCs, CFOs and Storage Administrators has been; what to do we do with all of the accumulating, mostly unstructured data? Some companies in the past have tried defensible disposition projects however, it’s really targeting a recurring symptom. Deleting a mountain of data without fixing the underlying problem ensures you will be addressing it again in a couple of years. Past experience shows that if left to end-users, all data will be kept forever - for reference and CYA reasons. In reality, there’s no easy answer.  

CIOs are usually on the fence about what to do with the rising amounts of data. GCs either want to keep it all forever (to reduce risk of spoliation) or delete of all of it (to get rid of smoking guns). And CFOs and storage admins want to dispose of it as soon as possible to save on purchasing additional enterprise disk and IT head count.

Throwing spinning disk at this uncontrolled and unmanaged data growth is no longer cost effective (due to the compound average growth rate of corporate data) as well as the associated problem - finding specific data within the TBs of data when needed.  In fact, a CTO at Iron Mountain Digital once made the observation that “It costs up to 500 times more to find and utilize a specific document, than to store it untouched for 20 years.”

The lifecycle of data

The lifecycle of data plays an important role in data retention and storage strategy. The chart below shows the probability of data reuse on a timeline. The chart also shows the estimated data growth per employee over the same timeline. The blue shaded area shows the estimated probability of a document being viewed/re-viewed as time passes. As you can see, very quickly, (day 15) the probability of reuse drops below 1% and continues to decline. However, the potential of document reuse never reaches zero. So is inactive data valueless?

MS_Archive_blog_2.jpg

The average of 15 days for information to reach a probability of re-use of less than 1% means the normal “usable life” of a document is very short – in fact, almost transitory. We’ve all seen this in our own data management activities. You receive an email/attachment and it is opened within a couple of days. As that email ages in your inbox, it quickly become buried under hundreds or thousands of newer emails and is never opened again.

It is the same with work files. You create a spreadsheet for whatever reason on day 1. On day 2 you recall it and make some additions and save it as version 2. The probability that version 1 will ever be opened again is almost (but not quite) zero. However, version 1 is rarely deleted so exists as inactive data -indefinitely.

The red shaded area in the chart above shows the cumulative increase in retained data from a single employee (assuming data is never disposed of). This data is made up of email, work documents, document revisions, PowerPoints, spreadsheets, audio and video files, and web research.

Segmenting your enterprise data

In 2012, the CGOC published a survey that described 4 basic segments for enterprise data. In that survey they found that 1% of data is usually subject to legal hold, 5% has regulatory retention requirements, 25% has direct business value, and 69% of the data is of unknown value.

Obviously, just because a document’s value is unknown doesn’t mean it should be deleted. Many companies simply don’t have the resources or time to categorize and defensibly dispose of unclassified data - so companies fall back on keeping all it. Looking at this data segmentation further, data on legal hold, compliance data, and the unknown data is usually semi-active or inactive and therefore a candidate for inexpensive Cool and Archive storage tiers while the data classified as having business value is mostly active and a good fit for the Hot storage tier.

As I mentioned at the beginning of this blog, many GCs are hesitant about deleting data for fear of destruction of evidence or insufficient eDiscovery claims. So in reality, many/most companies end up keeping inactive data indefinitely. The question companies’ face is; what should be done with the low-touch and inactive data, keep in on individual employee computers and enterprise storage, or archive and manage it in inexpensive cloud storage?

Microsoft and Archive360 Cloud-Based Intelligent Archiving

There are many reasons to retain low-touch and inactive data. Regulatory compliance, eDiscovery, corporate history, and data analytics. Retention for regulatory compliance and eDiscovery implies the need to consolidate information so it can be searched and produced on-demand in an acceptable amount of time. On the other hand, data kept for corporate history and data analytics reasons doesn’t require fast search and production, but does need to be consolidated for easier search and production.

MS_Archive_blog_3.jpg
Microsoft offers a comprehensive set of cloud services that developers and IT professionals can use to build, deploy, and manage new, powerful applications through Azure’s global network of data centers. As I mentioned at the beginning of this blog, today, Microsoft announced the general availability of Azure Archive Blob Storage tier to round out their already available Hot and Cool storage tiers. Now customers can choose the best storage tier (blob-level tiering) for their data as shown in the screen shot below.

To simplify data lifecycle management, Azure now enables customers to tier their data at the blob level.  Customers can easily change the access tier of a blob among the Hot, Cool, or Archive tiers as data usage patterns change - without having to move data between accounts. Blobs in all three access tiers can co-exist within the same account.

Updating the previous Lifelycle of Data chart, we can now see segmenting of the data further by its Azure storage tier as shown below.

MS_Archive_blog_4.jpg

As you can see in the above chart, active and semi-active data (the 31%) can be effectively stored and managed on the Hot and Cool tiers while very low-touch and inactive data (the 69%) can be stored and managed on the new low cost Archive tier. In fact, Azure Blob-Level tiering enables customers to optimize their storage costs by managing the lifecycle of their data across all three tiers at the object level.

Archive360’s Archive2Azure is one of the first native, cloud-managed solution for archiving and long-term data management built on Azure. Archive2Azure provides a highly secure, low cost, and compliant intelligent archive, perfect for the archiving and management of data.

Archive2Azure works with the customer’s Azure tenancy to store, index, search, and manage large volumes of data. Archive2Azure creates containers that store and manage data in “cabinets” on a specific Azure storage tier. This data can then be moved tier to tier, programically or manually, ensuring the most efficient performance level as well as greatest cost savings.

And the best thing about Archive2Azure is that you don’t need to hand over your data to someone else. Your sensitive data is held in your Azure tenancy, using your encryption keys with the data stored in its native format so you never have to pay a ransom to get it back.

The new Azure Archive Blob Storage tier is a boon for Records Managers, Archivists, HR Departments, IT Departments, and Legal Departments - anyone that collects and stores low-touch data for long periods of time.

For more information from Microsoft, you can read their announcement blog here.

For additional information on Azure Blob Storage and Archive2Azure, check out these related blogs:

  1. Efficient Storage of Low-Touch but Still Active Data
  2. SaaS Compliance Archive Solution in the Azure Stack
  3. A Backup is not an Archive … But, a Cloud Archive can be an Effective Backup