Forensic Data Analytics Challenges: Part 3 of Many - Data Visualization

Data Visualization Today: "Phenomenal Cosmic Power....Itty Bitty Living Space" (from Disney's "Aladdin")

When the Genie said to Aladdin “Phenomenal Cosmic Power…Itty Bitty Living Space” he may have been describing his life, but he may as well have been talking about visualization developments in Forensic Data Analytics. The evolution of new reporting tools and techniques enable analysts to present enormous amounts of information in compact spaces. In this era of Big Data, leveraging this capability effectively is critical. And the best way to explain these advancements in reporting is to present an example.

Recently, Attorney General Jeff Sessions held a news conference to announce charges against 412 people for $1.3 Billion in health-care fraud activities. Sessions said many leads in the operation came from “very sophisticated computer programs that identify outliers.” I did not participate in those specific analytics. However, I can lean on my extensive experience supporting companies in the healthcare/life sciences industry to deliver a glimpse of what those analyses may have looked like by creating a theoretical example that combines facets of multiple related cases.

In my example, a drug distribution company became aware of instances of diversion downstream in their supply chain. By diversion, I mean that drugs were used for unintended purposes. In this case (and taking a page out of “Breaking Bad” season one) pseudoephedrine from common cold medicines was used to manufacture methamphetamines. The company sought to implement a system to identify future purchase orders that might be indicative of diversion and halt those sales until a review could be performed.

The company initially planned to look at sales of pseudoephedrine to each customer for each month. These sales are represented by the bars in the chart below. The height of each bar represents how many times greater each month’s sales were when compared to the sales to the same customer 12 months prior. The company is likely to have thousands of retail customers reviewed over many historical months so this is only a snippet of the output.

The company planned to create a threshold at 3 times the previous year’s value (depicted here with the green line). This threshold was arbitrary, a quality that is important to avoid from a business perspective. This is because an arbitrarily low threshold can negatively impact operations by delaying too many acceptable orders as false positives (see article here https://www.linkedin.com/pulse/forensic-data-analytics-challenges-part-1-many-joseph-cheriathundam). This could impact their customers’ businesses and require significantly more resources for their own company to perform the reviews.

To remedy the situation, we took a few extra steps which we begin to depict in the graphic below. First, we highlighted the known issues (circles) on the chart. We quickly saw that we could extend the threshold to 4 times (yellow line) and still capture all of the known issues. This small variation reduced the impact to customers who were not involved in diversion and more appropriately set the required resources for the company without creating likely blind spots.

The analysis might stop here for some, but it shouldn't. An extended review of the flagged sales led to further insights. In this example, we found that a number of the sales which exceeded the threshold, were not because their sales in that month were high, but rather because their sales 12 months earlier were so low. By considering other metrics, we can devise an even more effective plan. The next chart depicts this step and begins to show the benefits of effective data visualizations.

We kept our y-axis the same as the comparison to the same month the prior year. We kept the green and yellow threshold lines where they were and we circled in red the known issues. However, we converted each bar to a blue-shaded circle so that we could see all of the sales for each customer for each month on a single page. This also allowed us to utilize the x-axis to present a second metric, the sales average in the region. The size of the circles represented a third metric: how much product was sold (total volume). The blue shading becomes a fourth metric where a darker dot represents a colder season.

The chart then tells a more expansive story. The dots which represents the issues are different sizes which means the magnitude of the sales volume was not critical to consider (e.g., diversion can occur amid smaller and larger customers). The color of the dot was also varied, so the issues could arise in the winter or the summer. However, the most powerful added message is that all of the known issues (red circles) are on the same side of a second threshold line that we drew from top to bottom on the chart against the x-axis. This additional threshold against a different metric, still identified the anomalies which the company should consider, but they eliminate another section of false positives (everything to the left of the vertical yellow line).

These are the types of charts that I love to provide to my clients. Data visualization should not just be a pretty multi-colored picture. Instead, it should tell a story that helps drive decision-making. In this case, it helped a company create a rule-set which had a defensible basis and could be easily implemented. If need be, the company could readily explain their thought-process to a regulator and show that they had considered many factors. More importantly, the findings ensured that the operations of the company would not be degraded dramatically with stopped orders for valid customer purchases. Most importantly, the company did their part to stem the tide of drug diversion. A true success story of data visualization!