Categorizing Grey Data Part 1

Posted by Bill Tolson • October 20, 2016

Yesterday I conducted a webinar for the Association of Legal Administrators titled Tomorrow’s Information Governance. One of the questions I received was about determining what is actually grey data, what should be kept and what is truly valueless data that should be disposed of. I thought it was a great question so I will address it here.

To review, grey or “low-touch” data is unstructured data that resides in the enterprise that either has no owner (departed employees) or is not subject to any current litigation hold, regulatory retention requirement, or has obvious business value. That description is actually the same as valueless data. However, the determining factor when differentiating grey data from valueless data is that grey data still has a potential reason for remaining within the enterprise. Examples include:

  • Ex-employee data that has value to current employees (e.g. reports, presentations, etc.)
  • Aging but unreferenced data belonging to current employees
  • Certain legacy archives
  • Very long term archival compliance data (e.g. HR, Legal, Sales, etc)
  • Old eDiscovery data sets
  • Corporate history

Determining Data Value

In previous blogs I have mentioned the 2012 CGOC Survey that determined enterprise data can be broken down in the following segments; 1% subject to legal hold and collection, 5% subject to regulatory compliance retention, 25% with some redeeming business value, leaving 69% of enterprise data deemed valueless and a candidate for defensible disposition. However, we have determined a sizable part (30%) of this valueless data is actually not valueless and should be retained for various reasons for example ex-employee data. Some companies now keep ex-employee data for a period of time based on the local statute of limitations in case a wrongful termination lawsuit is filed several years later.

Blog10202016.jpg

Figure 1

The question I received during the webinar yesterday focused on the 69% of enterprise data that the CGOC labeled as potentially valueless. In fact, there are two ways to approach this question of finding the grey data in this grouping:

  1. Filter for all truly valueless data, or
  2. Filter for all grey data

In reality, searching for obvious valueless data is not the best practice for this type of project. You can determine upfront what types of data are obviously valueless and search for them…for example all system files, or all files over ten years old, however this process can both leave a great deal of valueless files intact and potentially dispose of actual grey data that should have been kept.

In the next blog (part 2), I will continue this blog with details on how to cull for actual grey data.

Request Archive2Azure Demo!