The objective of review in in e-discovery is to identify as many relevant documents as possible, while reviewing as few non-relevant documents as possible (Da Silva Moore). This is known as achieving the highest possible recall (proportion of all relevant documents identified during a review) and precision (proportion of relevant documents within the reviewed set).

Keyword searching and linear review have traditionally been adopted as the default approach to disclosure. Although keywords are inherently biased thereby naturally excluding a proportion of relevant documents or necessitating the review of increasing volumes of irrelevant documents, lawyers have been relatively slow to adopt alternative approaches. However, predictive coding, a technology which automates portions of an e-disclosure document review, is now starting to gain popularity in the UK as an approach to disclosure.

What is predictive coding and how does it work?

Predictive techniques are commonly applied to analyse data in order to assess risk and make future predictions. They are not unique to the legal world - common everyday uses include credit scoring, fraud identification and risk underwriting and they have been widely adopted in a variety of industries including accountancy, insurance, banking, financial services, pharmaceuticals and healthcare.

Predictive coding systems apply complex algorithms which, based upon their analysis of review decisions, identify similar documents which are prioritised for review. In doing so, they aim to limit the review of irrelevant documents and enable relevant documents to be captured as efficiently as possible, thereby improving recall and precision.

A predictive coding exercise typically begins with a senior lawyer training an algorithm by reviewing a ‘seed set’ of example documents. The algorithm analyses the characteristics of these documents, learns from the lawyer’s decision making and thereafter seeks to identify similar documents and rank them by their likelihood of relevance.The most highly ranked documents can then be prioritised for review. This review continues until the system fails to return any further relevant documents or when the proportion of relevant documents becomes so low that continuing the review becomes disproportionate.

As in any disclosure exercise, a predictive coding methodology should be supported by an appropriate validation and quality checking regime so that decision making can be justified and each stage of a project independently verified.

Comprising the seed set

How the seed set should be comprised is up for debate, as is the length of time that should be taken to train the algorithm and the extent of any quality control regime that should be adopted in order to validate the process.In comprising the seed set, predictive coding in its most straightforward form will focus upon a randomly generated set of documents.  No keywords are run and the system is left to present example documents to the senior lawyer unhindered by bias. Relying on a randomly generated set of documents as the starting point is sometimes a step too far for most lawyers and could be seen as a blind leap of faith in the algorithm.  The risk being that it appears more difficult to validate how the algorithm has been trained and from project to project its ability to stand up to scrutiny is uncertain.

As such, some lawyers are preferring to adopt a middle-ground ‘hybrid approach’ where the seed set is comprised of a mixture of keyword responsive, other searches and randomly selected documents.

Advantages of Predictive Coding

  • Senior lawyer engagement at an early stage in the process
  • Improved Accuracy
  • Fewer irrelevant documents reviewed
  • Higher proportion of relevant documents identified
  • Faster access to the most relevant documents
  • Lower costs

Disadvantages of Predictive Coding

  • Questionable ability to deal with multiple-issues or degrees of relevance
  • Training by under confident or techno-phobic users may undermine the process
  • Questionable ability to cope with the evolution of relevance throughout a review
  • Won't eliminate the problem of a rogue reviewer
  • Questionable ability to deal with documents containing little or no text
  • Questionable ability to cope with foreign language documents

Dominic Tucker is a Senior Consultant at Anexsys Ltd, a leading provider of outsourced eDisclosure and litigation support services to law firms, corporations and government departments.

Further Information

Subscribers to Lexis®PSL Dispute Resolution can read Dominic Tucker’s full and more in depth analysis of Predictive Coding including of the hybrid approach recently endorsed by the Irish High Court in Irish Bank Resolution Corporation Ltd & ors v Quinn & ors [2015] IEHC 175.