The Role of Data in Slowing Coronavirus

Forensic Data Analytics Challenges: Part 4 of Many - Leveraging Data Despite Known Limitations

Introduction

We are facing a verifiable pandemic according to the World Health Organization. Coronavirus has hit almost every country in the world. In the United States, all but one state has witnessed their first confirmed case. Guidance in this country from the top has been mixed, some suggesting that everything is going to be all right and that no dramatic actions are required. On the other hand, a number of medical officials have suggested that social distancing is a must to ‘flatten the curve” which is a term used to describe a reduction in the rate of infections. Adding to the confusion is that stated death rates from the disease differ depending on who you ask. When the information we receive contradicts each other, what should we believe? When we look for truth, a good place to start is always with the data.

Death rates

Let’s start by taking a closer look at the computation of death rates. We have heard consistently that the death rate of COVID-19 is greater than the death rate of the flu. The death rate of the flu is generally regarded as 0.1% meaning that 1 out of every 1000 reported cases is likely to lead to death. For coronavirus, we have seen wildly different ratios in the past few months. Currently Hubei, China has settled into a 5% mortality rate (Wuhan is the capital of the Hubei province) while Iran and Italy have eclipsed China at a rate of 7% and 8%, respectively. The current rate in the United States is 1.9%. Once again, the percentages here might suggest that the Coronavirus is almost 20 times worse than the flu.

A deeper dive leads us to question the calculated rate using today’s data. The death rate associated with any disease is generally a fairly simplistic calculation. Specifically, it is the number of deaths caused by the disease divided by the number of confirmed cases of the disease. In the coronavirus calculation of today, the numerator is actually very well-known and well-documented. Unfortunately, tracking of all confirmed cases has not been nearly as accurate. We have been told that many who have coronavirus, never get symptoms. Even when symptoms have surfaced, testing has not necessarily taken place. This means that the denominator in the death rate calculation is understated. This reality means that the death rate could very well be significantly overstated (smaller denominators, lead to larger values).

Should I give up on this data

The frustration of imperfect data often leads researchers to seek other mechanisms for insights. To some extent, going back to the drawing board may be the right approach when seeking to reach certain conclusions. For example, it probably doesn’t make sense to compare the severity of the flu to that of COVID-19 with data that has a level of confirmed cases that is known to be understated. If you are really pressed to perform that comparison, it might make sense to find some sub-group for which testing has been more comprehensive. An example would be to look at the Princess cruise population. With seven deaths from a pool of 696 confirmed cases. (A review similar to this is probably how Dr. Anthony Fauci, Director of the National Institute of Allergy and Infectious Disease, arrived at his conclusion of 1% mortality rate for COVID-19 which he stated was likely 10 times worse than the flu).

But you can use the data to get a directional sense of relationships, especially when comparing data that is captured under similar circumstances. The chart below presents a mark for each U.S. state at the intersection of their population (along the y-axis) and the number of cases they have witnessed per million people (along the x-axis). The marks which have labels are the only states for which a death has occurred (data as of 3/16/2020 from CSSE at JHU) where COVID-19 was the cause of death. The number on each label represents the number of deaths in that state. Looking at the chart, a number of things stand out and may give us insight going forward. It should be noted that some states have populations over 20 Million and some states have more than 32 cases per million, but each is capped at these values for the purposes of presentation on the chart.

COVID-19 Data Represents Snapshot on 3/16/2020 Underlying COVID-19 data sourced from CSSE at JHU

COVID-19 Data Represents Snapshot on 3/16/2020
Underlying COVID-19 data sourced from CSSE at JHU

One thing that jumps out to me is that if you were to look at only the states that appear in the upper right-hand portion of the grid (Population over 4M and greater than 8 cases per million), you will see marks for 9 states. Of those 9 states, 8 of them have had deaths. Additionally, the marks that are furthest from the origin are generally the states that have had the most deaths.

What have we learned?

Interestingly enough, the mark for Massachusetts is just a little bit above that of Colorado and the mark for Illinois is above that of Georgia. This means that they have larger populations with about the same case rate. But neither Massachusetts or Illinois has suffered a death. This might be enough for healthcare providers to look into what those states may be doing well that could be leveraged by providers in other states to improve survival for coronavirus sufferers.

Similarly, Texas has a very large population (over 20M), but also has had no deaths due to COVID-19. Interestingly, Texas has maintained a rate of infection that is below 3 per million people. This low rate could be because confirmed cases are outside of urban city centers so the spread of the virus is slower than in places like New York City. Once again, this is probably something worth looking into, to see if the Texas policy-makers may be doing something that is working to contain the spread.

Finally, a couple of states had very few confirmed cases (9 to be specific) and yet both realized a death amongst that limited case-load. This certainly could be just an anomalous circumstance, but it may be worth digging into the details to see if there are processes that could be changed in those states that would improve survivability for future patients when the number of cases potentially increases.

I think it is also worth mentioning that the U.S. death rate has been hovering around 2%. However, spikes can be seen in the state of Washington (over 5%) where many of us are familiar with the elderly care facility in Kings County that was particularly hard hit. We also see a spike in the state of Florida (over 3%) which one may assume has to do with the larger percentage of older people in the population. These situations point to the need to be exceptionally cautious with our elderly. In the case of Washington, it could also point to the possibility that these rates can be impacted if spikes occur in very specific areas. This could be a harbinger for scenarios that could occur if our healthcare system is overwhelmed.

New and Changing Patterns

It’s certainly possible that the pattern described in this article is just a random coincidence. We would need much more data and much further investigation to conclude that there was a clear indication of causality. Regardless, what we don’t want to ever do as data analysts is think that we have it all figured out because we saw one pattern in a few locations at a specific point in time associated with one particular event. Patterns can and will change. In my anti-fraud efforts, we witness these adjustments all the time. A fraudster who is thwarted on one attempt, is unlikely to repeat the next fraud in the same manner. Similarly, what we see with COVID-19 will change and I’ve included some reasons below. It is up to analysts to spot the trends when they arise.

1.      Changes in culture. If a particular city has not developed a culture of caution, it certainly is susceptible to an outbreak. If such an outbreak led to a surge in one specific location, it could overwhelm local hospitals and death rates could be affected. On the other hand, regions that develop a culture of caution to include social distancing, can reduce caseloads and likely improve outcomes. Additionally, the culture has changed with the understanding that individuals can carry the virus and be contagious without ever feeling a symptom. The virus also has an incubation period of around five days, so even if you were to feel symptoms, it likely wouldn’t be for almost a week after being infected. This knowledge has made people more careful whether or not they are feeling sick.

2.      Changes in regulation. We have heard of lockdowns in Wuhan and Italy. Regulations to a lesser extent have begun in a number of cities here at home. This includes closing restaurants or keeping them open only for pick-up and delivery, closing schools, increased work from home allowances by employers, just to name a few. Each of these actions should slow down the rate of spread from what it could have been.

3.      Changes in health care practices. As with other pandemics we have witnessed, medical advances will be made to combat COVID-19. In the short term, this has been seen by better tools to test for the virus and a better understanding for the mechanism of spread. In the hopefully not-to-distant future a vaccine will obviously have an impact to reduce the rate of spread.

4.      Changes in data collected. From the beginning of this outbreak in the U.S., the number of tests that have been administered to patients has been limited to those who meet certain requirements. We have heard that a number of states are preparing to roll-out new tests which have been developed in the past few weeks. These tests will allow for a larger number of tests to be taken and for results to be obtained sooner. The former will hopefully aid in the containment of the virus. The latter factor will undoubtedly have an immediate effect of driving down death rates simply because the denominator of that ratio (the number of cases) will increase, but maybe also because early detection may improve care. Anyone analyzing the subsequent data will need to make sure that they are aware of the lack of consistency in the data across the states (and across time) to avoid arriving at improper conclusions

5.      Further findings by data analysts. One of the early findings with this disease was the increased susceptibility of older patients compared to younger ones. The data that I have secured does not contain age information for patients, but I did hear a statistic indicating that all deaths in the U.S. were to patients 40 and above. This knowledge has risen the care with which we treat elderly and potentially increased some of the risks taken amongst the more youthful. Actions like this won’t change the science of the virus, but it could change the patterns associated with patient age.

Conclusion

The business of data analytics is all about identifying patterns in data and converting them into meaningful actions. The impact of COVID-19 has been slowed by a number of instances in which data analysts have witnessed patterns related to the age of patients, periods of contagiousness and length of incubation periods. There is no doubt that analytics will play an even greater role as we get over this global hump by helping identify leading practices in health care, effectiveness of government regulations and the viability of vaccines.

Not Throwing Away Our Shot (To Better Apportion Representation)

Congress Tilt resized.jpg

The United States Constitution gives Congress the responsibility for determining the number of Congressman assigned to the House of Representatives from each state and it references the use of a Census every 10 years to make that determination. This responsibility clearly has the potential for impacting the passage of bills, the approval of budgets and even the election of the President. Despite its importance, the analytics involved in determining the number of representatives, is not specified except to say that each state’s seats “shall be apportioned….according to their respective numbers” referring to each state's population.

It would seem hard to screw this up, but the process involves politicians, so it is not surprising that, over the years, the math seems tilted one direction or another. In the last cycle of the census, the outcome of the current method coincidentally matched the projected outcome of a more fair and logical method. Some might say that this means that a change is not necessary. I believe that it means that the time is right to switch because passage through Congress is less likely to be stalled for political reasons.

History Has its Eyes on You

The musical Hamilton depicts Alexander Hamilton, our nation’s first Treasury Secretary, as a patriot motivated by what was best for the country. In one scene, Hamilton enters the ‘room where it happens’ to negotiate a deal with Thomas Jefferson and James Madison to promote the federalizing of the United States banking system. In the deal, Hamilton agrees to support the relocation of the capital from Philadelphia to Washington DC, a move that increased his own commute from New York while reducing the commute for others like Jefferson, Madison, and George Washington from Virginia. The move eliminated Revolutionary War debts (mostly taken on by northern states) so that the country could gain a strong financial footing at inception. Hamilton agreed to the deal because he felt it was fair and best for the country, even though he personally did not benefit.

So it should not surprise anyone that Hamilton introduced the first apportionment plan to pass through Congress. The plan was simplistic and statistically unbiased. Although it passed Congress, it was vetoed by President Washington because it did not guarantee a representative from every state. In its place, an apportionment plan was developed by Jefferson. This Jeffersonian plan was very similar to Hamilton’s plan; however, it introduced an adjustment in the mathematics to ensure each state was guaranteed at least one representative.  Conveniently, the plan ultimately added a representative to Virginia’s delegation at the expense of Delaware. It turns out that Jefferson’s plan favored states with larger populations. This was the first in a number of iterations of political maneuvering in the mathematical battle for allocation.

How Does the Sausage Get Made?

Congress instituted the ‘Method of Equal Proportions’ as the allocation method in 1941 and it remains in use today. The method drives to a solution which minimizes the difference in the average population per district and the relative difference in the individual share in a representative within each state. If these words sound complex…wait for it…it’s because they are. (If you want to talk the techie math talk, shoot me a message and we can geek out.) Simplistically put, the mindset for this concept is that adding a representative to a state with only a few representatives provides more value than adding a representative to a state that already has many representatives.

Since there are 435 seats in the House and just over 300 million people in the country based on the 2010 census, there should be a representative in Congress for about every 700,000 people. We’ll call this our representation ratio. So it would stand to reason that a state that has a population of 7 Million people may have 10 representatives while a state with 14 Million people would have 20 Representatives. Generally speaking this is what happens. But it’s also true that for every state and after every census the population is not evenly divisible by the representation ratio. Since awarding a state a fractional portion of a representative is not an option, the actual number of representatives are determined by either rounding up or rounding down. Hamilton’s plan simply specified that the state with the next greatest ‘leftover’ after this division would be rounded up until all 435 seats were filled. This does not necessarily happen when using the Method of Equal Proportions.

Would it Even Make a Difference?

After the 1980 Census, the state of California was in line for an increase in its representation in the House. Its population had increased to about 10.5% of the U.S. Population to 23.67 million people. Since the representation ratio was about one for every 520,000 people, California would seem to be in line for 46 of the 435 representatives in the House. Its population was 45 units of 519,000 with an additional 300,000 left to be accounted. On the other hand, Montana had 787,000 people a count that would justify one representative with an additional 267,000 left to be accounted. Since California had more ‘leftover’ people to be accounted for, it would seem that California would get priority over Montana for an additional representative. Not only was this not the case, but if we didn’t adjust the representation from the other 48 states, California would actually need to have a population increase of 2 million people to reach a level that would enable the state to regain priority over Montana for the extra seat.

After the 1990 Census a similar oddity led to three states gaining round-up priority over three more-deserving states. In this case the more populous northeastern states of Massachusetts, New Jersey and New York were helpless onlookers as Mississippi, Oklahoma, and Washington jumped up in the queue. I did not review particular voting counts on bills that were approved or rejected by the House in the subsequent 10 years. However, I did notice that the three former states (Mass., NJ & NY) were states that Al Gore won in 2000. Meanwhile, two of the three latter states were states won by George W. Bush. Since electoral votes are directly tied to the number of Congress members in each state, the difference in allocation methods would have changed the outcome of the 2000 election to an electoral tie, leaving the ultimate outcome not to the hanging chads of Florida, but ironically to the House of Representatives, which is tasked with breaking electoral ties.

What Comes Next?

One of the great compromises of the new-found government of the United States was to have a two-house system, one that gives each state an equal vote (the Senate with two votes for each states) and a second that divides representation based on the populations of each state. With that foundation, the Senate already grossly over-represents the population of smaller states. The House of Representatives was designed to balance that equation and ensure that one-half of Congress truly represented the people. It undoubtedly makes sense to switch the allocation method for Representatives in the House, one last time, back to a plan similar to the one that Hamilton devised at the inception of this Country’s democracy. This could be done simply by stipulating that each state with a population below the representation ratio should round-up to one representative before any other states are considered for rounding up.

Forensic Data Analytics Challenges: Part 3 of Many - Data Visualization

Data Visualization Today: "Phenomenal Cosmic Power....Itty Bitty Living Space" (from Disney's "Aladdin")

When the Genie said to Aladdin “Phenomenal Cosmic Power…Itty Bitty Living Space” he may have been describing his life, but he may as well have been talking about visualization developments in Forensic Data Analytics. The evolution of new reporting tools and techniques enable analysts to present enormous amounts of information in compact spaces. In this era of Big Data, leveraging this capability effectively is critical. And the best way to explain these advancements in reporting is to present an example.

Recently, Attorney General Jeff Sessions held a news conference to announce charges against 412 people for $1.3 Billion in health-care fraud activities. Sessions said many leads in the operation came from “very sophisticated computer programs that identify outliers.” I did not participate in those specific analytics. However, I can lean on my extensive experience supporting companies in the healthcare/life sciences industry to deliver a glimpse of what those analyses may have looked like by creating a theoretical example that combines facets of multiple related cases.

In my example, a drug distribution company became aware of instances of diversion downstream in their supply chain. By diversion, I mean that drugs were used for unintended purposes. In this case (and taking a page out of “Breaking Bad” season one) pseudoephedrine from common cold medicines was used to manufacture methamphetamines. The company sought to implement a system to identify future purchase orders that might be indicative of diversion and halt those sales until a review could be performed.

The company initially planned to look at sales of pseudoephedrine to each customer for each month. These sales are represented by the bars in the chart below. The height of each bar represents how many times greater each month’s sales were when compared to the sales to the same customer 12 months prior. The company is likely to have thousands of retail customers reviewed over many historical months so this is only a snippet of the output.

The company planned to create a threshold at 3 times the previous year’s value (depicted here with the green line). This threshold was arbitrary, a quality that is important to avoid from a business perspective. This is because an arbitrarily low threshold can negatively impact operations by delaying too many acceptable orders as false positives (see article here https://www.linkedin.com/pulse/forensic-data-analytics-challenges-part-1-many-joseph-cheriathundam). This could impact their customers’ businesses and require significantly more resources for their own company to perform the reviews.

To remedy the situation, we took a few extra steps which we begin to depict in the graphic below. First, we highlighted the known issues (circles) on the chart. We quickly saw that we could extend the threshold to 4 times (yellow line) and still capture all of the known issues. This small variation reduced the impact to customers who were not involved in diversion and more appropriately set the required resources for the company without creating likely blind spots.

The analysis might stop here for some, but it shouldn't. An extended review of the flagged sales led to further insights. In this example, we found that a number of the sales which exceeded the threshold, were not because their sales in that month were high, but rather because their sales 12 months earlier were so low. By considering other metrics, we can devise an even more effective plan. The next chart depicts this step and begins to show the benefits of effective data visualizations.

We kept our y-axis the same as the comparison to the same month the prior year. We kept the green and yellow threshold lines where they were and we circled in red the known issues. However, we converted each bar to a blue-shaded circle so that we could see all of the sales for each customer for each month on a single page. This also allowed us to utilize the x-axis to present a second metric, the sales average in the region. The size of the circles represented a third metric: how much product was sold (total volume). The blue shading becomes a fourth metric where a darker dot represents a colder season.

The chart then tells a more expansive story. The dots which represents the issues are different sizes which means the magnitude of the sales volume was not critical to consider (e.g., diversion can occur amid smaller and larger customers). The color of the dot was also varied, so the issues could arise in the winter or the summer. However, the most powerful added message is that all of the known issues (red circles) are on the same side of a second threshold line that we drew from top to bottom on the chart against the x-axis. This additional threshold against a different metric, still identified the anomalies which the company should consider, but they eliminate another section of false positives (everything to the left of the vertical yellow line).

These are the types of charts that I love to provide to my clients. Data visualization should not just be a pretty multi-colored picture. Instead, it should tell a story that helps drive decision-making. In this case, it helped a company create a rule-set which had a defensible basis and could be easily implemented. If need be, the company could readily explain their thought-process to a regulator and show that they had considered many factors. More importantly, the findings ensured that the operations of the company would not be degraded dramatically with stopped orders for valid customer purchases. Most importantly, the company did their part to stem the tide of drug diversion. A true success story of data visualization!

Forensic Data Analytic Challenges - Data Completeness

Most disputes and investigations have unique qualities even when the issues seem to be similar from matter to matter. Differences between the size, structure, quality and organization of the databases employed by companies can lead to variations in an analysis. Because of this dynamic, there is no instruction manual for analyzing data for every dispute that arises. As databases get larger and more complex, companies will become more dependent on their data analysis teams to navigate issues, relate data characteristics, identify trends, control costs, increase revenues and seek solutions when working with complete datasets.

The Seemingly Good

An example of these complexities can be constructed based on disputes for which I have been involved. In this particular example, a manufacturing company claimed that a global parts distribution company had overcharged them for a number of products over a six-year period. To arrive at a prospective damage amount the manufacturing company reviewed a sample of transactions.

To perform the computation, the manufacturer compared the actual sales price of a part on each transaction in the sample to the price guaranteed by the contract. For any transaction where the actual sales price was higher than the contract price, a ‘cost overage’ was computed. They added cost overages across all transactions in the sample and then divided by the total sales amount of the sample to determine a ‘cost overage rate' for each dollar spent. The prospective damage amount was computed as this rate multiplied times the total dollars of sales during the six-year period.

 The Bad

Our subsequent detailed review of the results, raised some questions about the methodology. To begin with, a number of transactions seemed to have a cost overage rate of exactly eight percent. When these transactions were reviewed further, there seemed to be a relationship between the listed contract price and the destination country. It turns out that the contract allowed for a surcharge to be added on ‘international’ sales. The amount of this surcharge…you guessed it, eight percent. By reducing the listed sales price on these international transactions to account for the surcharge, the prospective damage amount was significantly reduced.

And the Ugly

The second key finding was that the transactions with identified cost overages in the sample were much more prevalent in the first two years of the data period. As this characteristic was reviewed further, the trail led to a new system which was implemented at the same time as the line of demarcation that was being witnessed on the timeline of the matter. Simply put, the new technology system employed by the company had improved the accuracy of pricing. This phenomenon clearly showed that the extrapolation of the cost overage rate to the entire six-year population was inappropriate.

Instead, we calculated two distinct cost overage rates: the first for the initial two-year period under the old system and a second for the four-year period under the new system. Amazingly, because so many of the sales occurred in the later years and because the pricing errors in the final four years was extremely low, we reduced the prospective damage amount by more than 90%.

How Did Things Go So Wrong?

The initial data request from the plaintiff led directly to the flawed data analysis. To begin with, the country code associated with each sale was available in the defendant’s database but was not part of the data request. The inclusion of this field would not have guaranteed that the plaintiff would have identified the trend and changed their analysis, but the omission sealed their fate. Secondly, the plaintiff chose to focus its analysis on a sample of the transactions instead of the full population. The fact that there were noticeable omissions of important pricing factors played a key role in providing leverage to the defendant during settlement negotiations.

What Can Be Done to Make Things Go Right?

What can a company do to position themselves strategically for potential disputes? There are a number of things that organizations can consider doing to improve the effectiveness of their data analytics even before data are processed.

  • Before a dispute begins, perform a review of your information systems to determine whether the necessary data are being captured, maintained, and archived to support records retention requirements, compliance requirements and business operations. This is not something that you want to start after you are already in the line of fire.
  • Consider your data analytic team’s competency and comfort with complex database structures, data relationships, and database sizes.
  • Develop internal staff members who are experts with your systems. Obviously, this is valuable if you proceed with an internal data analytic team. However, even if you engage an external team, your internal staff will be invaluable in extracting data and answering questions -- saving time and money by minimizing confusion in interpreting data.
  • Although there are situations in which using a sample might make sense (e.g., data not available or manual entry required), it should only be used as a last resort. By its very nature, sampling leads to a layer of assumptions that often can be obviated in this era when computing space is cheap and processing is powerful.

Forensic data analytics may not be well understood, but there is no question that databases have a story to tell. The challenge is that they require someone who is familiar with the language of 1) the company, 2) the industry, 3) the regulations/matters, and 4) the tables rows, columns, code and schemas to understand, relate and translate the story. If your company can identify a quality data analytic team to serve as the interpreter for your data, you will be better-positioned to handle potential disputes that may arise, a condition that can save millions and ultimately make your day.

Forensic Data Analytic Challenges - False Positives in Anti-Fraud Analytics

The foundation of electronic fraud detection has long been the search for outliers – data points that don’t fit expected patterns. However, the evolution of anti-fraud analytics has accelerated in recent years to keep up with major advancements in technology. Many testing strategies now leverage more comprehensive data as companies capture more detailed financial data across greater durations of time. Additionally, the dramatic increase in processing speed has enabled a proliferation in the number and frequency of tests that can be performed.

With the trend of larger sets of data and increased testing, analysts often flag outliers that can reasonably be explained by the actual course of business. This phenomenon of “false positives” creates one of the biggest challenges when utilizing analytics to identify fraud. Specifically, false positives lead to unnecessary costs for a company as they perform reviews and investigations to determine the business validity of analytics-driven alerts. For this reason, it is vital that anti-fraud teams seek ways to limit these alerts

If It Looks Like Fraud, and Smells Like Fraud, It May Still Not Be Fraud

Recently I read an article touting Benford’s Law as a potential ‘holy grail’ of tests in the anti-fraud space. For those not familiar, Benford’s Law states that in a set of naturally occurring numbers, the leading digit is distributed in a specific, non-uniform way. To put it in layman’s terms, in many sets of numbers, one should expect that the first digit of these numbers will more often be 1s and 2s and less frequently 8s and 9s. (If you want to talk about Benford’s in techie talk, shoot me a message and we can geek out.) I can say that this concept bears out in many circumstances relating to expenses. In fact, we regularly implement Benford’s Law as a component of our test population when we analyze client expense data. However, when we arrive at results, we proceed with caution.

I recall an anti-fraud project I worked on many years ago. In an unedited application of Benford’s Law, our test flagged a number of transactions as potential fraud. The initial results from this test were striking: a clear spike for expense amounts with ‘6’ as the leading digit. But when we reviewed some of the output, we found that the company’s cell phone packages fell neatly into the range of $60.00-$69.99. Simply put, the transactions that were being flagged were not really instances of fraud.  

Hey, Benford, What Gives?

So what happened? Why did the results of the analysis improperly flag these transactions? To answer this question, we must go back to the definition. Benford’s Law depends on a ‘naturally’ occurring set of numbers. In our example, the company had negotiated a static monthly price for mobile service. Such a negotiated price leads to expense amounts which are not ‘naturally’ occurring.  Since each employee owned a cell phone and also expensed their charges each month, the number of expense records with an amount in this range was large enough to skew the outcomes.

The key point to note is that false positives like this one require time for review. They can also obscure our ability to spot anomalies which truly represent fraudulent activities.

No Worries, Just Don’t Perform the Test…Right?

What we don’t want to do is abandon these types of tests altogether since applying them can often provide value. We often derive better strategies by complementing quantitative data with qualitative knowledge to reduce the occurrence of false positives. In other words, consider the organization whose data you are analyzing when developing your test strategy. I’ve included a few suggestions below:

1.       Cleanse Data. In our case example, we did not dispense with the test altogether, but we did pull out the cell phone expense records (and other subsets like them). The subsequent findings provided a more actionable set of results. Specifically, testing the cleansed data effectively spotted instances of employees who manipulated or contrived expense records to avoid thresholds that were defined by the company to protect against fraud. For example, we may witness an excessive number of expenses in the range of $90.00-$99.99 if the company has a pre-approval requirement for expenses over $100. This is similar to a ‘structuring’ example in the world of Anti-Money Laundering.

2.       Add Customized Tests. Some might say that the removal of a subset of data left a void in our testing. In actuality, our understanding of the context of the data provided an opportunity. In the case of the cell phone records, we designed customized tests for the filtered population. We added a test to flag cell phone expenses which were greater than (or less than) a logical value. We also tested for the frequency of such transactions for each employee. These tests provided some of the “low-hanging fruit” amongst our actionable results.

3.       Prioritize Test Results. Sometimes the volume of data is extremely large and the rules of the company are extensive. Almost every test leads to numerous alerts, including many false positives. In these situations, companies and their data analytic teams often choose to run a wide array of tests and only review transactions with a ‘risk rating’ which meet a prescribed threshold. These risk ratings are based on the number and type of tests that each transaction, employee, and/or vendor fail. Companies can develop protocols to perform transactional reviews when risk scores exceed the threshold.

Conclusion

One of the many ways that businesses can use analytics to improve their financial health is with fraud detection. The Association of Certified Fraud Examiners estimates that fraud costs organizations an average of 5% of revenues. There are many sophisticated techniques now used to detect fraud, and there is a lot of enthusiasm at the moment for business analytics, and rightly so. When used effectively, analytics can take large data populations and efficiently isolate transactions that are worthy of review. This acceleration enables corporations to manage compliance more comprehensively at a fraction of the cost.

To make effective use of the unprecedented volumes of data and computational systems available, every testing strategy needs to ensure that communication and contextualization are components of the process. Analysts need to understand client operations and design tests to fit not only the data, but also the actual business that the data reflects. Doing so will lead to more meaningful and actionable results, a concept that is increasingly important with the higher hurdle of false positives.

 

Antiquated NCAA Politics

If political pundits determined who would run for the Republican presidential nomination in 2016, Donald Trump would not have even made the field. No one thought the business mogul had a shot at winning the nomination when he entered the primary. With good reason. A President has been elected in the United States every four years since 1789 and not once has one been elected who did not serve in the government and/or the military.

Election Chart.png

In the world of politics, the apparent anti-establishment sentiment appears to have changed the manner in which elections are won. Political experience is not the driving force it was in prior elections. Large donations by a few have been overwhelmed by the modest donations by many. The endorsements of political leaders have not equated to victories. Negative advertising by Political Action Committees have been rendered oddly ineffective.

I know there are many in the reading audience who have passionate views on this topic, but my point is not one of politics. Instead I am using a very familiar topic to introduce a concept that has become progressively more important in this era of increased data analytic activity. My point is that using historical data to predict future results can be misleading if factors change. This misdirection can take many forms, but I'm going to use the NCAA Tournament to shed some light on this concern. Specifically, the fact that the NCAA selection committee uses antiquated concepts to fill out the basketball tournament field is leading to flaws that are incredibly costly to a class of schools that are consistently being overlooked.

How are Teams Invited to the NCAA Tournament?

The naming of the 68 teams that are invited to the NCAA tournament is part objective and part subjective. College basketball teams do not make the NCAA tournament because they ask to enter the fray. There are thirty-two conferences that compete in the highest grouping of NCAA men’s basketball. Thirty-one of those conferences have a 3, 4 or 5-day conference tournament, with the winning teams earning automatic bids to the NCAA tournament. The Ivy League does not currently have a conference tournament, instead they give their automatic bid to their regular season champion.

The conference tournament format gives hope to almost every school even after mediocre to horrendous seasons. Winning a few games over one extended weekend can lead to an invitation to the Big Dance. The problem is the guaranteed opportunity that these conference tournaments offer to so many schools, often comes at the expense of schools that have excelled over the course of the entire regular season. With 32 teams getting automatic bids, only 36 spots remain for teams that don’t win their conference automatic bids. These 36 "at-large" bids are often not enough for most schools from small conferences to receive an invitation to the NCAA tournament if they don’t win their conference tournament.

Every year, the NCAA selection committee determines which schools make the NCAA tournament with the limited at-large bids that are available. The challenge is that they must make these decisions with very little tangible data to compare teams across conferences. There are very few games in the regular season in which teams from different conferences play each other. Even if they do, almost all of those games occur in the first two months of the season. By the time selections are made, teams have often improved or regressed dramatically.

What are the Statistics for At-Large Bids?

This year 26 at-large bids went to teams from the five “Power Conferences” (ACC, Big 10, Big 12, SEC, Pac-12) while only 10 went to teams from any of the other 27 conferences. In each of these five power conferences, at least one bid went to a team that lost more than 12 regular season games and/or lost as many conference games as they won. The NCAA selection committee was not this forgiving to any other team from any other conference. For years, the committee has had a tendency to select middling teams from the Power Conferences over top teams from smaller conferences. This was likely because the teams from these smaller conferences historically have had very little chance of making a run against multiple teams from the larger schools in the NCAA tournament.

Why Do They Do It This Way?

The argument from many college basketball experts is that the purpose of the NCAA tournament is to determine the best team in the country. Based on this premise, the NCAA should put the best teams in the tournament. These same experts suggest that the best teams generally come from the larger conferences because that’s where you will find the better players.

What’s Wrong with their Argument?

The first flaw in the argument is the suggestion that the purpose of the NCAA tournament is only to determine THE best team in the country. The tournament also gives us a sense of how all of the top teams and conferences stack up when compared to others. What we have learned is that schools don’t have to win the entire tournament to benefit from an appearance. Each victory in the tournament provides measurable value to the universities, the coaches and the players. It has led to progression for coaches like Brad Stevens (formerly at Butler now coaching the Boston Celtics), Shaka Smart (formerly VCU now at Texas) and Jim Larranaga (formerly George Mason now at Miami). Players like Doug McDermott (Creighton) and C.J. McCollum (Lehigh) have ridden NCAA tournament success to high draft status in the NBA.

Most importantly, universities have benefitted from each victory. Recent tournament successes have translated to increased applications (350% increase for George Mason), increased tuition revenues ($3.4M increase for VCU via out-of-state acceptances for 2012-13 school year) and free national media attention (valued at $1.2B for Butler over the span of their two extended tourney runs). These factors lead to a cyclical benefit as with Gonzaga which has received 18 consecutive invitations to the NCAA tournament with one or more tourney wins in all but three of those years.

The experts’ argument is also flawed because of its presumption that the best teams are the ones with the best players and those players generally go to schools from larger conferences. One does not need to look further than recent results to understand the flaw to this logic. Butler, VCU, and George Mason have successfully reached the Final Four of recent tournaments. Teams like Gonzaga, Wichita State and Northern Iowa performed well this year and in prior years. Either these small schools have some of the best players or the best players are not always on the best teams.

Interestingly enough, the National Invitational Tournament (“NIT”), which invites the ‘next best’ 32 teams, had a pair of semifinal match-ups which only included schools from non-power conferences: George Washington, BYU, San Diego State and Valparaiso. Conversely, only two of the nine schools invited from power conferences advanced to the quarterfinals of the NIT and that was only because both of those teams (Florida and Georgia Tech) played other power conference schools (Ohio State and South Carolina) in the prior round. Someone had to win.

What Factors have Changed?

As I warned at the top of this article, we must proceed with caution when using historical data to predict future results because factors may have changed. As in politics, college basketball has seen dramatic changes in the past decade.

1)      The talent disparity has diminished - The best players may still not go to the schools from smaller conferences, but the disparity has narrowed between the top players who play in smaller conferences and those who play for the middle layer of teams in the power conferences. This is a function of a number of reasons including the expansion of talent at the high school levels.

2)      Many college players are leaving school earlier - The most talented players almost universally play for the larger schools. This is still the case, but many are leaving school to enter the professional draft after fewer years. This leads to inexperience and lack of cohesion at many of the power conference schools.

3)      Smaller schools are making larger investments in their basketball programs - As universities witness the gains achieved by schools with improving basketball programs (both on the court and in the classrooms) greater investments have been made. Better athletic facilities and changing affiliations to stronger basketball conferences are just a couple of examples. The Ivy League has even decided to add a conference tournament which will increase revenues for all schools in the conference.

So What Should be Done?

The idea that middling teams from power conferences are more worthy of at-large bids to the NCAA tournament over dominant regular season champions from smaller conferences is antiquated madness. Teams like Pittsburgh, Texas Tech, Southern California and Oregon State each went 9-9 in their conferences, but still managed to get at-large bids. None of them won a single game in the tournament. (Before I hear from folks who point to Syracuse’s 9-9 record and their tourney success, I will note that their conference record was 9-6 when their coach Jim Boeheim was not serving his suspension in the middle of the season.)

Conversely, teams like St. Mary’s, San Diego State, Valparaiso and UAB each won their conference regular season titles in convincing fashion, with a 63-9 combined conference record. None made the NCAA tournament because they lost in their conference tournament and did not receive an at-large bid. In each case, they lost to a team in their conference tournament that they had beaten once or twice during the regular season. It is shameful to hand-off the NCAA tournament opportunity earned by these teams in favor of less-deserving teams that have proven repeatedly over the course of the season that they are NOT among the best teams in the NCAA.

 

How Bad Math Can Lead to a Potential Million-Dollar GSA Funding Error

First there is a Vendor who sells a Product

Ms. Vendor has a new company that sells entire pitchers of lemonade at her stand. She typically sells this lemonade for $15 for each pitcher. A number of parents in one neighborhood love her lemonade, but think that the price is too high. Mr. GSA is a parent in the neighborhood. He wants to negotiate on behalf of the neighborhood with the goal of creating value and efficiency for his neighbors.

GSA Negotiates with Vendor

Mr. GSA tells Ms. Vendor that there are a lot of kids in his neighborhood. In fact, there are more in his neighborhood than any other neighborhood. A lot of kids in his neighborhood will buy lemonade from Ms. Vendor as long as she agrees to sell it to the kids in his neighborhood at her absolute lowest price. Ms. Vendor agrees to do so knowing that her lowest price to any of her customers is $10 per pitcher.

What is the IFF

The parents of the neighborhood really appreciate that Mr. GSA coordinated these efforts and negotiated on their behalf. Because of this, the parents agree to pay an Icey Fluid Fee (IFF) of 10% that could be added to the price of each pitcher. This fee would be collected by Ms. Vendor and remitted to Mr. GSA at the end of every quarter.

The Vendor Sells the Product

Ms. Vendor begins selling pitchers of lemonade for $11 per pitcher; $10 for the cost of her pitchers plus $1 for the IFF. The agreement made between Mr. GSA and Ms. Vendor is a hit! Over the course of three months, she was able to sell 100 pitchers in Mr. GSA’s neighborhood, collecting $1,100.

The IFF Calculation

Appreciatively, she calculates the amount of IFF that she needs to pay to Mr. GSA. She multiplies her total collections from Mr. GSA’s customers to the IFF rate. Specifically, Ms. Vendor takes her collections of $1,100 and multiplies it by the 10% IFF rate to determine that she owed $110 to Mr. GSA. But Ms. Vendor is confused because she sold 100 pitchers and collected $1 of IFF on each pitcher. She wondered why her IFF payment wasn’t $100?

What’s Wrong with the Math?

Ms. Vendor’s common sense is correct. The key point is that when Ms. Vendor collects money from the customer, she is collecting two different streams of money: 1) the revenues from the sale of the product, and 2) the contributions for the IFF. For this particular example, she collected $1,000 of sales revenue and $100 of IFF contributions. The $100 of IFF contributions are 10% of the $1,000 of sales revenue.

The problem with her original calculation is that she was calculating 10% of all of the money she collected, which is $1,100. This included the $100 dollars which were the IFF contributions she had collected. By multiplying 10% to $1,100, Ms. Vendor was compounding her IFF payment by paying a fee on both the sales revenue portion of her collections as well as the IFF contributions she also collected. This subtle error in calculation led to an extra $10 in calculated IFF (i.e., 10% of the $100). It is no coincidence that this is also the difference between Ms. Vendor’s original IFF calculation of $110 and her common-sense assessment of $100.

The General Services Administration and the Industrial Funding Fee

The General Services Administration (“GSA”) has facilitated efficiencies and negotiated discounts in the government procurement process for many years. Currently all schedule sales have an administrative fee built into the pricing of all products and services offered under GSA Schedule programs. Vendors collect this Industrial Funding Fee (“IFF”) and remit it to the GSA on a quarterly basis.  The rate has changed over time and isn’t always the same for each GSA Schedule or Special Item Number (SIN). However, for most contracts the IFF is currently 0.75%.

This means that if a vendor collects money from GSA-Schedule sales of products/services to government customers, the amount collected can be separated into two distinct streams: 1) sales revenues and 2) IFF contributions. In the example of a vendor collecting $1,007,500 from GSA-Schedule sales to government customers, the company has received $1 Million in sales revenues and $7,500 (=$1M * 0.75%) in IFF contributions.

In my role as a Forensic Data Analyst, I often review GSA contracts for government contractors and analyze their detailed historical financial data and actual IFF payments. Surprisingly, I have noticed that many vendors are computing their IFF as though the fee was not already included in the contract price. In the example above, they would calculate IFF as $1,007,500*0.75%=$7,556.25 which is an extra $56.25 on every $1 million in actual revenues, or 0.005625%. In effect, they are falling prey to the same error that Ms. Vendor noticed in the lemonade example above.

You might say that this amount of money is not a lot and you may be correct on the surface. The companies that are overpaying IFF, may not even notice the “Cherry” that they pay on top of their quarterly IFF payments. However, it is interesting to note that total annual GSA sales are currently over $50 billion annually. This sales volume suggests that this particular Cherry could be costing Government Contractors close to $3 million a year.

Government contractors who have GSA schedule contracts should review their policies and procedures to determine if they are computing their IFF payments accurately. GSA states that vendors should “multiply total Federal Supply Schedule sales reported by the IFF rate in effect” (see “72A Quarterly Reporting System” https://72a.gsa.gov/ifffaq.cfm#12 ), but clearly care should be taken to determine true Schedule Sales, exclusive of IFF, before multiplying by the IFF rate in effect.

If you think you overpaid IFF in the past, but would like a neutral eye to review, please reach out. I would be happy to help. If you want to correct for historical overpayments, the GSA suggests that contractors direct questions regarding IFF calculations to their Administrative Contracting Officers. For contractors that choose not to correct historical IFF overpayments, my guidance would still be to ensure that they are calculating their quarterly payments properly going forward. It’s a simple fix that will save money.

Apple vs. FBI and the Presidential Candidates

It seemed appropriate to follow up on my previous post since the first substantive question of the most recent Republican and Democratic town halls pertained to phone metadata in anti-terrorism. If you have not heard, the FBI has obtained a court order against Apple. The order requires the business giant to "bypass or disable the auto-erase function" for one of the San Bernardino terrorist’s phones. Apple CEO Tim Cook responded in a message to their customers saying that they “oppose the order, which has implications far beyond the legal case at hand.”

The recent passing of the USA Freedom Act enables the government, with a warrant, to request phone metadata that companies collect. The Apple statement suggests that the government has already taken this step and that Apple has “done everything that is both within our power and within the law to help them.” Apple is likely referring to providing data that is already in their possession on Apple’s cloud servers. It is likely that the relevant phone company has also provided what they have in their possession.

Then what’s the Problem

Unfortunately, the data that these companies keep does not include all the data that may be on the phone of the terrorist, Syen Rizwan Farook. As one might expect, the FBI wants access to the additional data. And why not? The additional information could help to stop other terrorist activities, it was used by a known terrorist, the terrorist is not alive anymore and the phone was actually owned by Farook’s employer, San Bernardino County, not the individual.

The hiccup is that the phone is password protected. “No biggee” you might say. Give it to a techie with some good software and they should be able to ‘brute force’ their way into the phone. Just write some code that attempts every possible passcode until one works. In many situations this is exactly how it’s done, but not in this case. The encrypted I-phone at the heart of this issue has a passcode lock feature with an auto-erase option after 10 incorrect entries. The FBI does not currently have the capability to crack the code without risking erasing all the data on the phone.

So Just Get Apple to Open Up the One Phone

But the limitation does not stop there. Apple, too, does not currently have the capability to access the data on the phone. According to an article in Saturday’s Washington Post, Apple has developed and marketed with a new emphasis on privacy in the last few years; a strategy that has culminated in the security offered in recent operating systems. Interestingly, this new emphasis follows on the heels of the Snowden leaks that led to the dismantling of the National Security Agency (NSA) bulk phone record collection program.

Apple intentionally did not develop a solution because they fear that doing so would create a “backdoor” that could be used repeatedly on any other I-phone. Apple suggests that “while the government may argue that its use would be limited to this case, there is no way to guarantee such control.”

Some technical and legal experts have clarified that the difference between this FBI request and others, is that the government wants more than just data. In effect, they want a private company to work on behalf of the government to develop a product, in this case, a new operating system. Apple's public statement stresses that this action will threaten the privacy of all their customers. However, we can't ignore that it will produce a vulnerability in the security of Apple's own operating system and in doing so, weakens the company's own marketing strategy.

What do the Candidates Think

Both of the Democratic Presidential Candidates were asked during Thursday’s Town Hall what they thought should be done. Both Senator Hillary Clinton and Senator Bernie Sanders admitted that the question was not an easy one to answer and proved it by not giving a specific solution. Instead, both called for the technology firms and government officials to come together and find common ground.

The three Republican candidates who spoke during Wednesday night’s town hall (Senator Marco Rubio, Senator Ted Cruz, and Dr. Ben Carson) were asked the same question. Rubio and Carson answered similarly to the Democrats, while Cruz suggested that Apple should comply, stressing that it was only one phone, ignoring Apple’s concerns of control. Donald Trump and Governor John Kasich were not asked the question during their appearance on Thursday night, but according to the U.S. News and World Report, both have criticized Apple’s opposition. Trump has gone as far as to tweet that we should, “Boycott all Apple products until such time as Apple gives cellphone info to authorities regarding radical Islamic terrorist couple from Cal."

As I stated in my prior post, my goal is not to tell you whom to vote for, but rather to instigate a discussion that increases awareness and enables individuals to make a more informed election decision.

 

The Role of Data Analytics in Phone Surveillance

With the primaries underway, a friend of mine began his research on the candidates to prepare for casting his own vote. Impressively, he reviewed a number of the prior debates during his self-education process. He still has not reached a conclusion on his candidate of choice, but he did reach out to ask me a Data Analytics question that came up during his research. Given the seemingly omnipresence of data analytics, the breadth of the debate topics and the sometimes cryptic nature of political 30-second responses, I am not surprised that he had a question for me.

Specifically, he wanted to know how much data is the right amount of data to capture and analyze when it came to phone surveillance as it relates to anti-terrorism. The two programs discussed during the debates were:

1) The controversial National Security Agency (“NSA”) bulk phone record collection program or

2) The USA Freedom Act which became law this past summer

The Two Programs

Both the NSA bulk phone record collection program and the USA Freedom Act are designed to provide phone metadata to United States government agencies that are tasked with protecting the security of this country. In the current program (Freedom Act), the government can collect relevant and available phone metadata for individuals when a warrant is obtained. Previously, under the much-criticized NSA bulk program, data was captured for every individual and stored indefinitely.

In the case of the Freedom Act, analysts query a limited set of data reactively, after a warrant is justified. In the case of the NSA program, analysis can be performed proactively and comprehensively across all phone data. So the key question is whether the added value of extracting more data and analyzing it earlier provides enough anti-terrorism value to offset concerns about privacy.

How is Data Analysis Used

To help us make a judgment, we should start by noting that data analytic activity is not unique to phone surveillance. Data Analytic practitioners use historical data to identify ‘red flags’ and ‘predict’ future activities in many other scenarios. Suspicious activity in anti-money laundering; anomalous transactions indicative of business fraud; general ledger oddities suggestive of corrupt practices in foreign business. It might help if we carry forward the parallel of financial institutions reviewing bank transactional data to help identify potential money-laundering.

In this anti-money-laundering scenario, banks are required to monitor all accounts and report suspicious activity to the U.S. government. Data analysts start with what may seem like a blank slate, but looks can be deceiving. Experienced data analysts are familiar with a number of money-laundering schemes and are able to write code to identify these schemes in the data. Do cash transactions exceed certain thresholds? Are deposits split to avoid these thresholds? Are there transactions to/from parties on watch lists? Are the transactions to/from entities in high-risk nations? Are the transactions consistent with the institution’s knowledge of the customer’s expected activity?

In the same way, data analysts could presumably review phone data associated with individuals involved in prior attacks. Armed with their findings, analysts could write code to identify patterns in available phone data; locations, durations, frequencies, calls to high-risk phone numbers, and key words in actual discussions. In this era, technology has advanced such that the volume of data should not be a limiting factor; so ALL tests can be run on ALL data in a reasonable and actionable period of time.

So What’s the Problem

With the completeness of data and the comprehensiveness of testing, the good news is that all anomalies or ‘red-flags’ would be highlighted. Unfortunately, the bad news is also that all red-flags would be highlighted. Whenever a comprehensive approach is used in data analytics, managing ‘false-positives’ must be a real consideration. False-positives in this AML discussion would be transactions that fail certain tests, but are not improper. In today’s business of data analytics false-positives are a constant challenge because practitioners are pushing limits to optimally harness the power of technology and the potential of ever-increasing data. So the million-dollar question is whether there is a way to handle these analytical speed bumps?

So What’s the Solution

There are a number of ways that data analysts lessen the impact of false-positives. Each mechanism generally attempts to differentiate the risks associated with each transaction. One of the most common methodologies is applying a risk scoring system. We can use a corporate anti-fraud testing example to provide some color to this discussion.

In this methodology, hundreds of anti-fraud tests might be run and each test is given a risk score. The tests that are most highly correlated with actual fraud activity are given a higher score and vice-versa. A payment to a company with a tax identification number that is the same as an employee’s social security number would have a high score. A payment with a round-dollar amount would contribute a low score while a weekend transaction would likely be somewhere in the middle.

The data for each of the company’s financial transactions is run against each test and accumulates its own cumulative risk score based on its failed tests. At the conclusion of all testing, false-positives can be managed by setting thresholds to eliminate groups of transactions that do not reach a sensible cumulative risk score set by subject matter experts (“SME”). Additionally, the transactions with the highest scores are the ones that are prioritized for review first.

So is More Data Better

In my mind, Data Analysts could effectively analyze large volumes of phone metadata in an anti-terrorism setting by implementing a similar procedure. If these analysts have access to all phone data and are equipped with the technical skills and expertise to assimilate and test these data, they can certainly identify more red flags. But as we have learned, more is only better if it is possible to prioritize those red flags while minimizing the impact of false positives.

To do this, Data Analysts must leverage anti-terrorism SMEs who could make sense of the results and overcome the noise of false-positives. In a phone surveillance setting, the SMEs would have knowledge of communications patterns associated with terrorist activities, access to risk indices from other anti-terror programs, an understanding of the resulting actions and findings from those actions. In an ideal scenario, the analysts themselves would have the requisite subject matter expertise to bridge any gaps between the technical testing and the real-world application of results.

And the Winner Is

There is no question that using all data eliminates the possibility of having blind spots in the analysis. Collecting data beforehand also enables proactive testing which means that there isn’t a requirement for a terrorist plot to be discovered or a terrorist act to be committed before pursuing and developing leads. Additionally, there is potential for data analytic methodologies to enable countries to prioritize their counter-terrorism activities. This would be especially useful in countries that are resource constrained when it comes to tracking the terror subjects currently on their watch lists.

However, any expectations engendered by a Utopian view of Data Analytics must be measured by the limitations of reality. As such, determining the right phone surveillance program depends greatly on each individual’s opinions on a series of questions, including but not limited to the ones below.

1) What is your individual tolerance to the tracking of personal data?

2) How all-encompassing do you want the country’s anti-terrorist activities to be?

3) Would the government limit the use of this phone data to anti-terrorist activities?

4) Would the government be able to bring together the collection of SMEs necessary to take advantage of the value offered from more comprehensive data?

5) Would the teams work collegially and continuously to build a more robust analytic solution?

Fortunately, my goal is not to give a definitive answer, but rather to instigate a discussion that increases awareness and enables individuals to make a more informed election decision. I would love to hear the thoughts of others to help complete the picture.

 

The Dawn of Analytic Matters, LLC - Bridging the Gap

"Excited" is an understatement of epic proportions, but it most simplistically describes my emotions at this time. I have spent a number of years building skills, experiences and relationships while working with leading consulting services firms that have put me in a position to take this next step. I have been incredibly fortunate to work with great people on great projects, providing great service to great clients.

So why take this next step? Many of my friends, clients and colleagues ask this question and the answer is difficult to package into a simple one-liner. What I know is that I believe in the power of Data Analytics. I also believe that clients often receive only a shell of the benefits that Data Analytics can provide. This is because the individuals who work with the data, frequently don't understand the client's needs. Meanwhile, the consultants who understand the client's needs, rarely work with the data directly. I have built a career bridging this gap. Working directly with the data to remove the disconnect to the client's business, the legal issues and the corporate challenges.

So if you are looking for a one-liner, I'm seeking to optimize my ability to provide great service to my clients. I want to work directly with my clients, understand their challenges, know what keeps them up at night and tailor analyses to meet their specific needs. I know that when I can create this dynamic, the clients that I have had the pleasure and privilege of working with in the past have always benefited. Analytic Matters provides me with the platform to ensure this dynamic and enables me to do what I do best: Bridge the Gap.