The Shame of Dark Data
Much of an organization’s electronic data consists of unstructured data (80+%) made up of PSTs, employee work files, employee desktop backups, system generated files, versions, and to a lesser extent, corporate legal department eDiscovery results sets. Depending on the company, a large portion of this data could be attributed to departed employees where their accounts, file shares, and even email accounts were not closed. In most cases, this unstructured data is unmanaged, left almost entirely to end users to manage… which means it's not managed.
If we don’t know it’s there, its not…right?
This unmanaged data is referred to as dark or grey data, and not surprisingly, every company is affected by it, whether they know it or not. Funny story, several years ago I visited a medium-sized multinational to talk about the need for archiving. They proceeded to tell me that they didn’t need to archive because they had a handle on their dark data. I was expecting them to politely ask me to leave, but then I asked the CIO a simple question; how much of your file share capacity was taken up with PSTs – old PSTs are a symptom of dark data? The CIO looked at the Director of Storage and raised his eyebrows (signaling to answer). The Director of Storage shrugged and said, “I don’t know, but I’m sure it's not much” – he didn’t know what he didn’t know.
A bridge is burned
I turned to the CIO and suggested they check during lunch (assuming I would still be there to eat lunch). The Director of Storage quickly left the room while lunch was brought in. I saw him glance at the food and he looked back at me with a frustrated look – I may have burned a bridge there.
While lunch was concluding, the Director of Storage returned (looking hungry) and sat down. The CIO asked him what he had found… The Director looked around the room and settled on me last. With a vein throbbing on his forehead he said 63% of their file share capacity was consumed with PSTs (hundred plus terabytes). Much of the PSTs had been there for many years – a prime example of dark data. No one knew the files existed, they weren’t managed, and they were consuming expensive enterprise storage. As the conversation continued, I was wondering how their legal department handled (or not) these unknown dark data files when responding to a discovery order, but that’s another story.
Based on my experience, many/most companies have no idea of the extent of their dark data problem. However, knowing the problem could exist is a big step in fixing the problem. Next step – what to do about it.
Shine a light on dark data the right way
When faced with a dark data problem, many companies will address it in one of two ways – delete everything or call in consultants to determine what should be disposed of – sifting through terabytes of files at $200 to $300 per hour.
A better solution is to utilize a system that allows you to quickly and inexpensively determine what is disposable and what should be kept and managed. An information management system that allows you to consolidate all unstructured data and fully index it so you can search for files by the last date accessed, keyword, custodian, etc., would allow you to cull obvious dark data quickly. For example, searching for files with an owner not in Active Directory would help you cull departed employee data. This solution sure beats hiring consultants or the risk of mass deletion.
The key is to consolidate and then index the potentially dark data into an intelligent information management/archiving platform which allows you to search for and work with all the data using the same search engine.
The cloud’s silver lining
A low-cost cloud with information management capability is the best solution if that cloud is your Azure Cloud with Archive2Azure as the intelligent information management solution. Utilizing your company’s Azure tenancy ensures low-cost public cloud pricing, increased security (your encryption keys), and a future-proof platform you can grow with.
Archive2Azure provides the additional information management capabilities to index, search, and cull your dark data so you can fix your data issues without hiring consultants or blindly deleting everything.
In today’s world of cloud computing, machine learning, and extremely fast data migrations, dark data needn’t be an embarrassment anymore.
About Bill Tolson
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.