The Role of Data Analytics in Phone Surveillance

With the primaries underway, a friend of mine began his research on the candidates to prepare for casting his own vote. Impressively, he reviewed a number of the prior debates during his self-education process. He still has not reached a conclusion on his candidate of choice, but he did reach out to ask me a Data Analytics question that came up during his research. Given the seemingly omnipresence of data analytics, the breadth of the debate topics and the sometimes cryptic nature of political 30-second responses, I am not surprised that he had a question for me.

Specifically, he wanted to know how much data is the right amount of data to capture and analyze when it came to phone surveillance as it relates to anti-terrorism. The two programs discussed during the debates were:

1) The controversial National Security Agency (“NSA”) bulk phone record collection program or

2) The USA Freedom Act which became law this past summer

The Two Programs

Both the NSA bulk phone record collection program and the USA Freedom Act are designed to provide phone metadata to United States government agencies that are tasked with protecting the security of this country. In the current program (Freedom Act), the government can collect relevant and available phone metadata for individuals when a warrant is obtained. Previously, under the much-criticized NSA bulk program, data was captured for every individual and stored indefinitely.

In the case of the Freedom Act, analysts query a limited set of data reactively, after a warrant is justified. In the case of the NSA program, analysis can be performed proactively and comprehensively across all phone data. So the key question is whether the added value of extracting more data and analyzing it earlier provides enough anti-terrorism value to offset concerns about privacy.

How is Data Analysis Used

To help us make a judgment, we should start by noting that data analytic activity is not unique to phone surveillance. Data Analytic practitioners use historical data to identify ‘red flags’ and ‘predict’ future activities in many other scenarios. Suspicious activity in anti-money laundering; anomalous transactions indicative of business fraud; general ledger oddities suggestive of corrupt practices in foreign business. It might help if we carry forward the parallel of financial institutions reviewing bank transactional data to help identify potential money-laundering.

In this anti-money-laundering scenario, banks are required to monitor all accounts and report suspicious activity to the U.S. government. Data analysts start with what may seem like a blank slate, but looks can be deceiving. Experienced data analysts are familiar with a number of money-laundering schemes and are able to write code to identify these schemes in the data. Do cash transactions exceed certain thresholds? Are deposits split to avoid these thresholds? Are there transactions to/from parties on watch lists? Are the transactions to/from entities in high-risk nations? Are the transactions consistent with the institution’s knowledge of the customer’s expected activity?

In the same way, data analysts could presumably review phone data associated with individuals involved in prior attacks. Armed with their findings, analysts could write code to identify patterns in available phone data; locations, durations, frequencies, calls to high-risk phone numbers, and key words in actual discussions. In this era, technology has advanced such that the volume of data should not be a limiting factor; so ALL tests can be run on ALL data in a reasonable and actionable period of time.

So What’s the Problem

With the completeness of data and the comprehensiveness of testing, the good news is that all anomalies or ‘red-flags’ would be highlighted. Unfortunately, the bad news is also that all red-flags would be highlighted. Whenever a comprehensive approach is used in data analytics, managing ‘false-positives’ must be a real consideration. False-positives in this AML discussion would be transactions that fail certain tests, but are not improper. In today’s business of data analytics false-positives are a constant challenge because practitioners are pushing limits to optimally harness the power of technology and the potential of ever-increasing data. So the million-dollar question is whether there is a way to handle these analytical speed bumps?

So What’s the Solution

There are a number of ways that data analysts lessen the impact of false-positives. Each mechanism generally attempts to differentiate the risks associated with each transaction. One of the most common methodologies is applying a risk scoring system. We can use a corporate anti-fraud testing example to provide some color to this discussion.

In this methodology, hundreds of anti-fraud tests might be run and each test is given a risk score. The tests that are most highly correlated with actual fraud activity are given a higher score and vice-versa. A payment to a company with a tax identification number that is the same as an employee’s social security number would have a high score. A payment with a round-dollar amount would contribute a low score while a weekend transaction would likely be somewhere in the middle.

The data for each of the company’s financial transactions is run against each test and accumulates its own cumulative risk score based on its failed tests. At the conclusion of all testing, false-positives can be managed by setting thresholds to eliminate groups of transactions that do not reach a sensible cumulative risk score set by subject matter experts (“SME”). Additionally, the transactions with the highest scores are the ones that are prioritized for review first.

So is More Data Better

In my mind, Data Analysts could effectively analyze large volumes of phone metadata in an anti-terrorism setting by implementing a similar procedure. If these analysts have access to all phone data and are equipped with the technical skills and expertise to assimilate and test these data, they can certainly identify more red flags. But as we have learned, more is only better if it is possible to prioritize those red flags while minimizing the impact of false positives.

To do this, Data Analysts must leverage anti-terrorism SMEs who could make sense of the results and overcome the noise of false-positives. In a phone surveillance setting, the SMEs would have knowledge of communications patterns associated with terrorist activities, access to risk indices from other anti-terror programs, an understanding of the resulting actions and findings from those actions. In an ideal scenario, the analysts themselves would have the requisite subject matter expertise to bridge any gaps between the technical testing and the real-world application of results.

And the Winner Is

There is no question that using all data eliminates the possibility of having blind spots in the analysis. Collecting data beforehand also enables proactive testing which means that there isn’t a requirement for a terrorist plot to be discovered or a terrorist act to be committed before pursuing and developing leads. Additionally, there is potential for data analytic methodologies to enable countries to prioritize their counter-terrorism activities. This would be especially useful in countries that are resource constrained when it comes to tracking the terror subjects currently on their watch lists.

However, any expectations engendered by a Utopian view of Data Analytics must be measured by the limitations of reality. As such, determining the right phone surveillance program depends greatly on each individual’s opinions on a series of questions, including but not limited to the ones below.

1) What is your individual tolerance to the tracking of personal data?

2) How all-encompassing do you want the country’s anti-terrorist activities to be?

3) Would the government limit the use of this phone data to anti-terrorist activities?

4) Would the government be able to bring together the collection of SMEs necessary to take advantage of the value offered from more comprehensive data?

5) Would the teams work collegially and continuously to build a more robust analytic solution?

Fortunately, my goal is not to give a definitive answer, but rather to instigate a discussion that increases awareness and enables individuals to make a more informed election decision. I would love to hear the thoughts of others to help complete the picture.