Forensic Data Analytics

Forensic Data Analytic Challenges - False Positives in Anti-Fraud Analytics

The foundation of electronic fraud detection has long been the search for outliers – data points that don’t fit expected patterns. However, the evolution of anti-fraud analytics has accelerated in recent years to keep up with major advancements in technology. Many testing strategies now leverage more comprehensive data as companies capture more detailed financial data across greater durations of time. Additionally, the dramatic increase in processing speed has enabled a proliferation in the number and frequency of tests that can be performed.

With the trend of larger sets of data and increased testing, analysts often flag outliers that can reasonably be explained by the actual course of business. This phenomenon of “false positives” creates one of the biggest challenges when utilizing analytics to identify fraud. Specifically, false positives lead to unnecessary costs for a company as they perform reviews and investigations to determine the business validity of analytics-driven alerts. For this reason, it is vital that anti-fraud teams seek ways to limit these alerts

If It Looks Like Fraud, and Smells Like Fraud, It May Still Not Be Fraud

Recently I read an article touting Benford’s Law as a potential ‘holy grail’ of tests in the anti-fraud space. For those not familiar, Benford’s Law states that in a set of naturally occurring numbers, the leading digit is distributed in a specific, non-uniform way. To put it in layman’s terms, in many sets of numbers, one should expect that the first digit of these numbers will more often be 1s and 2s and less frequently 8s and 9s. (If you want to talk about Benford’s in techie talk, shoot me a message and we can geek out.) I can say that this concept bears out in many circumstances relating to expenses. In fact, we regularly implement Benford’s Law as a component of our test population when we analyze client expense data. However, when we arrive at results, we proceed with caution.

I recall an anti-fraud project I worked on many years ago. In an unedited application of Benford’s Law, our test flagged a number of transactions as potential fraud. The initial results from this test were striking: a clear spike for expense amounts with ‘6’ as the leading digit. But when we reviewed some of the output, we found that the company’s cell phone packages fell neatly into the range of $60.00-$69.99. Simply put, the transactions that were being flagged were not really instances of fraud.  

Hey, Benford, What Gives?

So what happened? Why did the results of the analysis improperly flag these transactions? To answer this question, we must go back to the definition. Benford’s Law depends on a ‘naturally’ occurring set of numbers. In our example, the company had negotiated a static monthly price for mobile service. Such a negotiated price leads to expense amounts which are not ‘naturally’ occurring.  Since each employee owned a cell phone and also expensed their charges each month, the number of expense records with an amount in this range was large enough to skew the outcomes.

The key point to note is that false positives like this one require time for review. They can also obscure our ability to spot anomalies which truly represent fraudulent activities.

No Worries, Just Don’t Perform the Test…Right?

What we don’t want to do is abandon these types of tests altogether since applying them can often provide value. We often derive better strategies by complementing quantitative data with qualitative knowledge to reduce the occurrence of false positives. In other words, consider the organization whose data you are analyzing when developing your test strategy. I’ve included a few suggestions below:

1.       Cleanse Data. In our case example, we did not dispense with the test altogether, but we did pull out the cell phone expense records (and other subsets like them). The subsequent findings provided a more actionable set of results. Specifically, testing the cleansed data effectively spotted instances of employees who manipulated or contrived expense records to avoid thresholds that were defined by the company to protect against fraud. For example, we may witness an excessive number of expenses in the range of $90.00-$99.99 if the company has a pre-approval requirement for expenses over $100. This is similar to a ‘structuring’ example in the world of Anti-Money Laundering.

2.       Add Customized Tests. Some might say that the removal of a subset of data left a void in our testing. In actuality, our understanding of the context of the data provided an opportunity. In the case of the cell phone records, we designed customized tests for the filtered population. We added a test to flag cell phone expenses which were greater than (or less than) a logical value. We also tested for the frequency of such transactions for each employee. These tests provided some of the “low-hanging fruit” amongst our actionable results.

3.       Prioritize Test Results. Sometimes the volume of data is extremely large and the rules of the company are extensive. Almost every test leads to numerous alerts, including many false positives. In these situations, companies and their data analytic teams often choose to run a wide array of tests and only review transactions with a ‘risk rating’ which meet a prescribed threshold. These risk ratings are based on the number and type of tests that each transaction, employee, and/or vendor fail. Companies can develop protocols to perform transactional reviews when risk scores exceed the threshold.

Conclusion

One of the many ways that businesses can use analytics to improve their financial health is with fraud detection. The Association of Certified Fraud Examiners estimates that fraud costs organizations an average of 5% of revenues. There are many sophisticated techniques now used to detect fraud, and there is a lot of enthusiasm at the moment for business analytics, and rightly so. When used effectively, analytics can take large data populations and efficiently isolate transactions that are worthy of review. This acceleration enables corporations to manage compliance more comprehensively at a fraction of the cost.

To make effective use of the unprecedented volumes of data and computational systems available, every testing strategy needs to ensure that communication and contextualization are components of the process. Analysts need to understand client operations and design tests to fit not only the data, but also the actual business that the data reflects. Doing so will lead to more meaningful and actionable results, a concept that is increasingly important with the higher hurdle of false positives.