Cross Posted from Global Privacy Watch

The White House released a set of reports this month on Big Data and the privacy implications of Big Data. While a number of folks have been discussing the President’s Council of Advisors on Science & Technology (“PCAST”) report, I would offer that the Office of Science and Technology Policy (“OSTP”) report needs to be read in conjunction with the PCAST report. They do two different things. One is a report on the technical state of affairs, and the other is more of a policy direction piece, which is driven by the technologically-oriented findings. Various points-of-view have been put forth as to the relative merits of each report, but there seems to be an important element missing from both reports. Both reports discuss the need for policy decisions to be based on context and on desired outcomes. Unfortunately, neither report really gives a good taxonomy around the informatics ecosystem to allow for a clear path forward on “context” and “desired outcomes”. What I mean by this is best summed up in the comment in the PCAST report which states: “In this report, PCAST usually does not distinguish between “data” and “information”.”. “Data” and “Information” are very different things, and one really can’t have a coherent policy discussion unless the distinction between the two is recognized and managed.

Informatics Ecosystem

The importance of having a clear taxonomy around the informatics lifecycle cannot be overstated. In fact, the challenges of most privacy system implementations reflect this circumstance. For example, attempting to classify “personal information” is not an easy thing. Is a first/last name combination with ZIP personal information? If the name is John Smith and the ZIP is 11004, likely not. However, if the name is John Tomaszewski and the ZIP is 77002, it absolutely is personally identifiable – there is only one of me. Consequently, we need a better way of describing the different relative elements of the taxonomy.

  • Data

Often, we hear Data and Information used interchangeably. This most certainly not the case. Data, by itself is a representation, or token, of a fact. For example, data is 77002. It is a ZIP code. By itself, data isn’t very useful. You can’t action raw data. This is the foundational state for the taxonomy. It is also rather rare in the real world.

  • Information  

Information is the next transformative state of Data. It is Data used within a context. The context or “metadata” is what gives value to the Data. To go back to the name and ZIP example, the context that the last name is Polish and the ZIP is in Houston, transforms two simple data points into Information. You now have an identity of a unique individual.

  • Knowledge

Knowledge is the next transformative state of Information (a pattern emerges). Not only is Knowledge actionable, it can be used to evaluate and identify past patterns. Instead of only action, Knowledge provides the capability of Understanding.

  • Wisdom

The final transformative state in this taxonomy is Wisdom (You can call it whatever you want, but this seems to fit). Wisdom is enough Knowledge to be able to start to predict future states.

Each of the states gets triggered by a critical mass of the prior state being fused together. This continued fusion of Data with more and more Data is what makes Big Data useful – you can finally get to Wisdom.

The challenge that the two White House reports have, is that they discuss the risks associated with Big Data without describing which level in the taxonomy they are concerned with. Each level of the taxonomy has a greater and greater potential for impact (both good and bad). Consequently, if you are looking for context-based, outcome-driven policy, you need to know which layer you are in the taxonomy. Neither report does this in an effective manner. As a result, whether you think the reports are a good thing, or “too little, too late” there is still going to be a deficiency in having the policy conversation until those at the table start using the same structure.

Email this postTweet this postLike this postShare this post on LinkedIn
Photo of John Tomaszewski John Tomaszewski

John Tomaszewski specializes in emerging technology and its application to business. His primary focus has been developing trust models to enable new and disruptive technologies and businesses to thrive. In the “Information Age”, management needs to have good advice and counsel on how…

John Tomaszewski specializes in emerging technology and its application to business. His primary focus has been developing trust models to enable new and disruptive technologies and businesses to thrive. In the “Information Age”, management needs to have good advice and counsel on how to protect the capital asset which heretofore has been left to the IT specialists – its data.

John’s expertise in the understanding of a company’s data protection and management needs provide a specialized point of view which allows for holistic solutions. A good answer should always solve at least three problems.

John has been a co-author of several information security and privacy publications, including the PKI Assessment Guidelines and Privacy, Security and Information Management: An Overview; as well as publishing scholarly works of his own on the topic. He has also provided input to the drafting of various security and privacy laws around the world; including the APEC Cross-Border Privacy Rules system. He is a frequent speaker globally on the topics of cloud computing, Self Regulatory Organizations (“SROs”), cross-border privacy schemes, and secure e-commerce.