Forensic Data Analytic Challenges - Data Completeness

Most disputes and investigations have unique qualities even when the issues seem to be similar from matter to matter. Differences between the size, structure, quality and organization of the databases employed by companies can lead to variations in an analysis. Because of this dynamic, there is no instruction manual for analyzing data for every dispute that arises. As databases get larger and more complex, companies will become more dependent on their data analysis teams to navigate issues, relate data characteristics, identify trends, control costs, increase revenues and seek solutions when working with complete datasets.

The Seemingly Good

An example of these complexities can be constructed based on disputes for which I have been involved. In this particular example, a manufacturing company claimed that a global parts distribution company had overcharged them for a number of products over a six-year period. To arrive at a prospective damage amount the manufacturing company reviewed a sample of transactions.

To perform the computation, the manufacturer compared the actual sales price of a part on each transaction in the sample to the price guaranteed by the contract. For any transaction where the actual sales price was higher than the contract price, a ‘cost overage’ was computed. They added cost overages across all transactions in the sample and then divided by the total sales amount of the sample to determine a ‘cost overage rate' for each dollar spent. The prospective damage amount was computed as this rate multiplied times the total dollars of sales during the six-year period.

 The Bad

Our subsequent detailed review of the results, raised some questions about the methodology. To begin with, a number of transactions seemed to have a cost overage rate of exactly eight percent. When these transactions were reviewed further, there seemed to be a relationship between the listed contract price and the destination country. It turns out that the contract allowed for a surcharge to be added on ‘international’ sales. The amount of this surcharge…you guessed it, eight percent. By reducing the listed sales price on these international transactions to account for the surcharge, the prospective damage amount was significantly reduced.

And the Ugly

The second key finding was that the transactions with identified cost overages in the sample were much more prevalent in the first two years of the data period. As this characteristic was reviewed further, the trail led to a new system which was implemented at the same time as the line of demarcation that was being witnessed on the timeline of the matter. Simply put, the new technology system employed by the company had improved the accuracy of pricing. This phenomenon clearly showed that the extrapolation of the cost overage rate to the entire six-year population was inappropriate.

Instead, we calculated two distinct cost overage rates: the first for the initial two-year period under the old system and a second for the four-year period under the new system. Amazingly, because so many of the sales occurred in the later years and because the pricing errors in the final four years was extremely low, we reduced the prospective damage amount by more than 90%.

How Did Things Go So Wrong?

The initial data request from the plaintiff led directly to the flawed data analysis. To begin with, the country code associated with each sale was available in the defendant’s database but was not part of the data request. The inclusion of this field would not have guaranteed that the plaintiff would have identified the trend and changed their analysis, but the omission sealed their fate. Secondly, the plaintiff chose to focus its analysis on a sample of the transactions instead of the full population. The fact that there were noticeable omissions of important pricing factors played a key role in providing leverage to the defendant during settlement negotiations.

What Can Be Done to Make Things Go Right?

What can a company do to position themselves strategically for potential disputes? There are a number of things that organizations can consider doing to improve the effectiveness of their data analytics even before data are processed.

  • Before a dispute begins, perform a review of your information systems to determine whether the necessary data are being captured, maintained, and archived to support records retention requirements, compliance requirements and business operations. This is not something that you want to start after you are already in the line of fire.
  • Consider your data analytic team’s competency and comfort with complex database structures, data relationships, and database sizes.
  • Develop internal staff members who are experts with your systems. Obviously, this is valuable if you proceed with an internal data analytic team. However, even if you engage an external team, your internal staff will be invaluable in extracting data and answering questions -- saving time and money by minimizing confusion in interpreting data.
  • Although there are situations in which using a sample might make sense (e.g., data not available or manual entry required), it should only be used as a last resort. By its very nature, sampling leads to a layer of assumptions that often can be obviated in this era when computing space is cheap and processing is powerful.

Forensic data analytics may not be well understood, but there is no question that databases have a story to tell. The challenge is that they require someone who is familiar with the language of 1) the company, 2) the industry, 3) the regulations/matters, and 4) the tables rows, columns, code and schemas to understand, relate and translate the story. If your company can identify a quality data analytic team to serve as the interpreter for your data, you will be better-positioned to handle potential disputes that may arise, a condition that can save millions and ultimately make your day.