Many email migration vendors have taken the easier path and have built their migration applications around aging and poorly documented email archiving platform APIs. However, as we've pointed out on our blog several times before, reliance on email archiving vendor APIs to extract legacy email archives in a fast and issue-free manner is a recipe for aggravation and disappointment mainly due to the simple fact that email archive vendor APIs were never designed to efficiently and quickly extract data from their solution - rather they were designed to move data into their applications. This disparity between extraction and ingestion capability, especially in extraction performance, is becoming harder to justify. The question many are putting forward is; are email archiving vendor APIs the only sure way to extract data from a legacy archive?
Before we answer that question, let’s look at some of the other claims being put forward about vendor APIs.
Claim #1: It doesn’t make sense to extract data faster than it can be ingested
Positioning archive extraction speed as only important if the new repository ingestion speed is at least as fast or faster is an attempt to lump all data migrations into the same bucket, i.e. move the entire archive data set from A to B all at once. We agree that target repository ingestion speed is a partial limiting factor in the overall migration process, but is only a major limiting factor if you are migrating the entirety of a legacy archive into another repository - for example when migrating 100 TB from a legacy email archive directly into Exchange Online. However, many migration customers recognize the need to run a data remediation process on the 100 TB of data before it’s ingested into a new archive. The issues a data remediation process highlights are; how much of that 100 TB of archived data is expired and over its retention limit, how much of that data belongs to employees that left the company more than five years ago, or how much of that data are duplicates or in other words do you really want to pay to migrate valueless and expired data?
A data remediation step which includes a defensible disposition process before the data is ingested into the target repository is a recognized best practice to reduce risk and cost during discovery as well as provide storage cost reductions. If you’re planning a data remediation step as part of your data migration, extraction speed DOES make a huge difference in the process.
Claim #2: For high performance target repository ingestion, the use of vendor APIs is bestRelying on the target repository API for data ingestion (if one exists) is also a best practice but has nothing to do with the extraction process. As we pointed out earlier, archive vendors design their APIs to efficiently and quickly ingest data INTO their applications, so of course you should use all target vendor tools available when moving data into a new repository/archive. As figure 1 below shows, archive vendors go to great pains to design APIs to get data into their applications, but don’t put the same effort into allowing data to be taken out of their archives. The main point is that legacy archive extraction APIs and new target repository ingestion APIs are not the same and are not reliant on each other.
Claim #3: The proper use of vendor APIs doesn’t have to be slower but…
Vendor APIs are not always the worst choice when considering legacy archive extraction capability but you should consider the actual API in question. Well written APIs from vendors that normally import and export data from their applications in general can be the best and fastest way to move data out of an application, but again, email archiving vendors aren’t in the habit of supplying optimized APIs to move large amounts of data away from their platforms. Archive360 has proven time and again that many email archiving vendor APIs don’t measure up to our direct approach to accessing and extracting archived data, in both speed and accuracy.
Let’s reiterate what Archive360 believes:
- Extraction speed is extremely important under many circumstances (see our previous blog titled “Extraction Performance Does Matter”).
- Vendor data extraction APIs are usually not as fast or capable as our direct access method.
- And yes, ingestion speed is limited by the target repository’s ingestion capabilities so in those rare instances where you are moving all data from a legacy archive directly into a new repository, an extremely fast extraction capability does not make a huge difference.
Let’s explore one final claim made recently by an archive migration vendor…
Claim #4: Chain of custody REQUIRES the use of vendor extraction APIsThe belief that you must use the vendor’s APIs to extract data from the legacy archive to maintain chain of custody during eDiscovery is completely incorrect. First, chain of custody questions only arise when you’re migrating archived email DURING a litigation hold/eDiscovery. Unless you’re under a court order, the legally risk averse practice would be to delay the migration until the litigation has concluded.
Second, this belief raises the question; what legal authority (or case precedent) has actually verified every vendor’s API capability and defensibility? A data extraction can be verified to be an exact copy of the original archived data without relying on the original vendor API.
Another fact to keep in mind during eDiscovery – all potentially relevant data must be turned over to opposing counsel, including all metadata that was available when the litigation hold was applied. In fact some legacy vendor APIs simply extract basic message and attachment data ignoring all associated metadata – such as storage folder, BCC, deletion date, date read etc. If this happens, chain of custody is broken and spoliation has occurred. ALL metadata is important and a required part of the eDiscovery process so we believe that only the vendor API produces forensically complete data sets is completely wrong.
So to address the question asked in the first paragraph of this blog; are email archiving vendor APIs the only sure way to extract data from a legacy archive? - The answer is a definite NO!Archive360 exceeds the standard performance in the legacy email archive migration industry with the ability to accurately and defensibly extract up to 5TB per day of archived data through by-passing vendor APIs. Archive360 has vast experience in migrating many of the market-leading email archives. Archive360’s Archive 2-Anywhere has the ability to accurately and defensibly migrate archived email faster than anyone else. If you’d like to move on from your legacy email archive, contact Archive360.