News broke on Wednesday and Thursday this week that Swiss authorities have raided Fifa’s headquarters to obtain computer data relevant to ongoing criminal investigations into alleged corruption surrounding the 2010, 2018 and 2022 World Cup tournaments.  Data was obtained from individuals including Sepp Blatter, Jerome Valcke (Sepp Blatter’s right-hand man) and Fifa’s head of finance.

Over the coming days, Swiss investigators will face a series of challenges as they seek to manage this data and attempt to identify the most relevant and incriminating documents.  As they progress their investigation, a variety of techniques will become available to them which could help in quickly identifying these documents.

Collecting the data

Reports are conflicting as to whether data was captured by the authorities or handed over voluntarily by FIFA, however either approach raises important issues that arise in any eDisclosure exercise.  Data collections must be correctly scoped and implemented with appropriate procedures in place to document the process and protect the forensic integrity of data.   Scoping should identify the key custodians as well as the devices and computer systems used.  A poorly planned and implemented collection could undermine the entirety of the investigation.  The goal is to collect data in a proportionate manner which ensures that nothing is missed or modified.  This is to avoid accusations of deliberate evidence tampering or spoliation, as well as the risk that incriminating data may later be deemed inadmissible as evidence.

In serious cases such as this where fraud and corruption are in issue, it is common for data to be collected by means of forensic imaging.  This is the most complete option of forensic collections, undertaken by skilled technicians, which involves taking a perfect copy of a device, computer or other electronic media.  The technique protects against data being modified whilst also collecting deleted items which can be missed by other techniques.

An advantage to forensic imaging is that the Swiss authorities would effectively ‘have everything’.  As the investigation evolves, different lines of enquiry develop and new characters emerge, investigators will have no need to waste time performing further collections.  Conversely, a disadvantage to this approach is that a collection may be too wide and investigators may be faced with a vast volume of completely irrelevant documents.  (More on dealing with this below).

In less serious cases other collection options exist which are less disruptive and therefore more practical. A collection must always be forensically sound and properly documented, however it is not uncommon to undertake a more targeted collection which focusses upon copies of key email accounts and shared working folders.   Such an approach would be inappropriate in the present circumstances as it would rely upon Fifa pointing the Swiss investigators towards relevant documents, opening the investigation to allegations that important and incriminating data has not been collected.

Processing the data

Before the Swiss authorities can get into the documents and progress their investigation, the collected data must be processed.  Depending upon the processing software used and the volume of data collected, processing could take many days to complete.

In a nutshell, processing (1) extracts documents from container files (for example, emails from mailboxes), (2) extracts metadata values from documents including author, title, dates of creation etc.., and (3) organises this data into a format capable of upload to a modern document review platform.

The processing stage will provide investigators with the first insights into their data and their first opportunity to identify any shortfalls or gaps.  Does the data reveal any issues posed by Fifa’s data retention policies? Is data missing from a particular date period?  Have emails been archived in a separate archiving system and not collected?  How many documents have each of the key custodians yielded and are further collections from key advisors or assistants going to be necessary?

Given Fifa’s size, the scope of the investigation and the rate at which modern organisations create data, investigators could not unreasonably expect to be faced with millions if not tens of millions of documents to sift through.  This is not unusual and a variety of technologies and techniques exist to assist investigators to quickly identify the relevant documents.

Investigating the data

The first task after processing will be to take defensible steps to cull the volume of documents.   Of course, no data is actually deleted, it is simply put to one side and not looked at by investigators at this stage.   Should the course of the investigation require, it remains readily accessible.

As a first step, there will be a range of filetypes that can be immediately removed including system generated files which are unlikely to be useful to investigators in the present case.  These system files could become useful however, should access to information or systems become an issue.  Duplicate documents can also be excluded at this stage and date ranges applied to exclude documents from entirely irrelevant periods. With key exclusions applied, the documents and their metadata will be transferred to a review tool, for example kCura’s Relativity platform, where keywords can be applied and further analysis of the documents undertaken.

Given the global nature of Fifa’s business, applying keyword search terms is likely to involve iterating terms in multiple languages and alphabets.  This isn’t a problem and modern platforms have been designed to cope with this.  Carefully crafted keyword searches can assist to quickly identify important documents, however, the first drawback to keyword searching which the Swiss investigators will encounter is that fraudsters won’t necessarily speak directly of their fraud, rather use a range of discreet vocabulary and terminology to either disguise their acts or at least make them sound legitimate. The classic example being the Enron investigation where Star Wars terminology was adopted which was unlikely to have been identified by the keyword approach.

The second drawback is that despite investigators’ best efforts in culling data and applying the most carefully crafted keywords, they are still likely to be faced with an almost insurmountable volume of documents to review.  The reality is that modern data volumes have eroded the circumstances in which keywords can be applied in isolation to other techniques.  In modern document heavy cases, we commonly advise our clients on a range of solutions to tackle the issues posed by enormous document volumes, including:

  1. Predictive coding

Perhaps the most high profile of available approaches, predictive coding systems apply a complex algorithm which learns from review decisions to identify similar documents and prioritizes these for review.  The aim of the technology is to improve the recall and precision of a review exercise.  Recall being the proportion of all relevant documents identified and precision the proportion of relevant documents in the reviewed set. Predictive coding can help to drastically reduce the volume of non-relevant documents reviewed, whilst returning a higher proportion of relevant documents discovered.

  1. Structured and Unstructured Data Solutions – enabling a contextual review

The Swiss investigators will have a combination of structured and unstructured data available to them.  By unstructured data we primarily mean the range of office documents including word, excel and pdfs.  By structured data, we’re referring predominantly to the range of formal financial reporting documents including bank statements and transaction reports.

Review platforms can be calibrated to facilitate a contextual review by allowing an investigator to focus upon key transactions or trades.  Documents can be associated to specific events, trades or transactions and presented to investigators in context.   Such an approach is not only ideal for the Swiss investigators, but can be readily applied in a variety of cases, including rate rigging and insider trading.

  1. Email Threading

To assist with the review of correspondence between key individuals of interest, investigators could apply an email threading process across Fifa’s data. The process identifies all emails within a chain and selects the longest example of a chain for review, removing the need to review all emails in the chain individually.

We have seen email threading reduce reviewable volumes by between 10% to 50%.

  1. Clustering & Conceptual searching

Clustering can be applied to identify groups of conceptually similar documents.  Creating clusters requires little user input, however, once created, clusters require reviewing in order to determine their usefulness and identify the most relevant clusters to prioritise these for review.  The process is particularly useful when dealing with unfamiliar data.

  1. Duplicate and near-duplicate analysis

When a damaging or incriminating document has been found, an investigation can be expanded by reviewing duplicate or near-duplicate copies of that document in order to track its progress around an organisation or amongst a group of individuals in order to demonstrate custodianship and knowledge.

Our experienced team of eDisclosure consultants regularly advise upon solutions to document heavy eDisclosure exercises.  Anexsys are also unique in employing a team of skilled software developers to develop customised solutions when the circumstances of a case require it.

For further information on any of these techniques and other effective approaches to document heavy eDisclosure, contact a member of our eDisclosure team on +44 (0) 203-217-0300 or email info@anexsys.com