How big data is changing the discovery process for law firms

Between its launch in January 2018 and December 2020, the Zondo Commission interviewed 278 witnesses, and collected 159 109 pages and 1 exabyte of data as evidence (that’s one billion Gigabytes). The Commission of Inquiry is far from over, and it remains to be seen what the final data tally will be.

The big question, of course, is how this data is being reviewed. Collecting data is one thing. Finding the proverbial ‘smoking gun’ is another.

According to a recent study by the Association of Certified E-Discovery Specialists (ACEDS), the average case contains 6.5 M Pages, 10 to 15 Custodians, and 130 GB of data. That’s the equivalence of 100 truckloads of data – times 100.

Our own closer-to-home experience through Lextrado reveals that the average case in South Africa produces approximately 5 453 GB of data, has 31 subjects per case and that only 304 GB can be culled from the data. It’s an enormous workload for legal teams to sift through looking for the data that’s relevant to a legal case.

The tsunami of data and the law

We all know there’s been a data explosion in recent years. Increased electronic communications and new data types are making it progressively more difficult for legal teams on civil and criminal cases alike to collect, review and even interpret data.

Let’s begin with the challenges of collection. From collaborative tools like Microsoft Teams and Slack, to a rapidly increasing array of social media apps, not to mention cloud tools and repositories like Google Docs, Dropbox, and Microsoft 365, data is not only prolific, but spread across numerous channels.

All data falls into one of two categories: structured data and unstructured data. Structured data includes anything with a high consistency in terms of fields and values across database entries. Bank records, for example, are an excellent example of structured data.

Unstructured data is everything else. Any text document, spreadsheet, pdf, presentation, image, video or audio file is considered unstructured data. It’s emails and responses to emails, chats over Whatsapp, team channels on collaboration channels, and social media posts. It’s even the likes, comments and shares of social media posts.

Consider for a moment how social media is archived, for example. Legal teams do not only have to find an original post, but any comments or shares associated with that post. Factor in emails, text messages and a host of other unstructured data, and it’s not hard to see how data can quickly start moving into exabyte territory for an inquiry as broad as the Zondo Commission. A mere 5 000 GB of data appears small in comparison, and yet, if printed out, 1 GB is the equivalent of a full storage box. 5 000 GB is therefore 5 000 storage boxes, which stacked end to end is over 1 kilometre.

Unstructured data makes up a staggering 80% or more of all enterprise data, and the percentage keeps growing.

Leveraging eDiscovery in a data-driven world

For legal discovery teams, leveraging appropriate technology has become essential. In fact, for many of our clients, their most valuable asset in conducting internal investigations efficiently is their eDiscovery partner.

Today’s best eDiscovery solutions not only support the hundreds of file types that make up unstructured data, but they can convert those files into information that is easy for non-technical users to grasp quickly, ensuring that all parties involved in a legal case can preserve, collect, review, and exchange information in electronic formats for the purpose of using it as evidence. Today, the practice of law fundamentally depends on eDiscovery competency.

Consider too, how a discovery team’s time can be focused on legal strategies instead of searching through thousands of gigabytes of data with a tool that can comb through unstructured data quickly, identify relevant data and dramatically reduce document review time.