The Data = Oil Fallacy

News ¦ June 9th, 2017, 11:00 pm

Or: 2 Reasons Why Buzzfeed Style Analogies Lost You $2,000 Last Year.

Why the data = oil discussion matters

Data has been dubbed the ‘new oil’ in an inflationary fashion, and headlines such as “The world’s most valuable resource is no longer oil, but data” by The Economist are ubiquitous. It has long been a favorite past time of Snapchat-first Millennial news outlets and high-capacity hobby pundits to distinguish themselves by concocting counterintuitive theorems that liken any new development to a seemingly unrelated blockbuster topic. But in the case of data and oil, these folks are actually not that misguided to point out ostensible parallels between the two. In fact, the online world from a resource perspective does resemble the year 1885 more than anything. Similarly to 1855, we very recently discovered that the new land we are chartering, the internet, contains very valuable resources below its surface, namely data. Similarly to oil, it needs special tools to extract it. Furthermore, it also needs refinement in its raw state in order to be turned into a directly usable product.

But before we dive in, let’s quickly clarify an important concern:

Why the hell should I care?

Valid question. The short and simple answer is this: because you could have made an average of $1,600 last year by doing practically nothing — but you didn’t. That’s because someone else made money with your data instead: data brokers.

The question is how it got to the point where companies could extract such fortunes from virtually every American. As it turns out, the notion of dubbing data the new oil stands at the center of the process that legitimizes this industry; while data has a lot of commonalities with oil, treating the one as a direct online counterpart of the other is not just factually wrong, but also dangerous as it institutionalizes a terminology that makes it acceptable to treat data as a public good. Let’s start by taking a look how precisely data differs from oil.

Difference #1: Data is not a generic resource

First and foremost, data is not the same type of commodity as oil since it is not generic. Oil is oil — whether it’s sourced in Texas or Baku. But data is inherently heterogeneous; data from Facebook is vastly different than data sourced from Amazon.

Furthermore, the million different types of data are, independent of their refinement process, not equal in value. This also holds true not just between online platforms, but also inside of them. For instance data that is private on Facebook, such as Likes and Posts, stands to offer more valuable insights than data that is publicly available, i.e. demographic information such as Hometown and Languages.

Difference #2: Data is a personal resource

The second difference between data and oil revolves around the origin of the two. Oil is a natural resource, whereas data is a personal resource. This difference becomes apparent when we take a look at the level of resources contained in a property at the time of its acquisition. Oil is the result of a mysterious millennia-long process involving dead organic material, heat, pressure, and time, and it naturally occurs in certain properties — mostly deserts, arctic areas, river deltas, and continental margins offshore. When a person or company acquires a field of land, oil may or may not be contained in it. Either way, the local existence of oil always predates ownership of property rights, or mineral rights to be more specific. Therefore the claim to the land and its minerals are attributed to the person purchasing it, or the respective government owning it.

Resource Occurrence at the Moment of Purchase

In our online world, properties are not strips of land but virtual domains, such as websites and apps. Contrary to oil, data is not a naturally occurring resource on these domains. When Mark Zuckerberg purchased, there were no resources — there was no data. This data was only created when users signed up for the service and were allotted a piece of online land to them in the form of an account, with Facebook ultimately acting like a landlord leasing out the property. When users started to cultivate the virtual piece of land, through filling out profile information, uploading pictures, and interacting with their friends, data was created and the property became rich in resources.

Contrary to oil, the local existence of data postdates ownership of property rights of a piece of online land. This allows us to clearly identify the source of this resource, therefore also allowing us to clearly attribute ownership of the resource. In most cases, these virtual landlords require certain rights of the resources created by their tenants to make their job worthwhile: users of online platforms give a worldwide, non-exclusive, royalty-free license to the virtual landlord, be it Twitter or Facebook. Nonetheless, they still own the virtual mineral rights of the data they create. Rulings such as the recent Data Portability Law established in the European Union further substantiate users’ ownership of these data mineral rights. The usage rights associated with the licenses you grant to FB and Twitter are far-reaching and allow these companies to utilize your data in a lot of different ways, from placing your photos in ads (Instagram) to using your ideas (LinkedIn), effectively turning its user base into a kind of crowdsourced R&D department. And while the results might be creepy at times, it’s still an explicitly formed contractual agreement that leaves both the user and the ‘landlord’ better off than before. Users get to use great services and landlords make money by showing targeted ads. Fair game.

Why calling data the new oil is dangerous

Unfortunately, the currently prevailing narrative of data being virtual oil legitimizes many firms to ignore these two important differences and treat data like it is a generic and naturally occurring resource. Outside of the expressively formed contract with virtual landlords such as Facebook, 3rd party companies also use your data. Whenever you declare a part of your online property as ‘publicly available’, you lose control over who enters your land and what they can do on it. This leads to thousands of firms coming onto your property and setting up their virtual oil derrick to extract the resources you created on that piece of land. This happens even though there was never any explicit consent from your side. Companies called data brokers are at the heart of this ethically challenged business model.

Credit: The New Yorker

Data brokers have been heavily scrutinized by the Federal Trade Commission and the Senate Committee on Commerce, Science, and Transportation under the leadership of Senator Jay Rockefeller IV. These companies are extracting and monetizing an asset that they do not own. Some might call that stealing. “These companies are only picking up scraps — there can’t be a lot of money in this” you must think to yourself. But this is far from the truth; data brokers do not operate in the realm of petty larceny. So just how big is the data brokerage industry?

The last confirmed number that we have is a figure presented by the Senate Committee of Science, Commerce, and Transportation in a Senate Hearing 113–693 entitled “What information do data brokers have on consumers, and how do they use it?” Here’s an excerpt of fmr. Senator Jay Rockefeller IV talking about the size of the data brokerage market:

Yes, you heard that correctly. The estimated size of the Data Brokerage market in 2012 was $156 billion! That is more than four times the revenues the entire casino industry in the US generated ($37 billion) and higher than the GDP of roughly 70% of the world.

Finding the dominant strategy

A natural reaction right now would be to simply declare all of our online property as ‘private’ in an effort to eradicate all data extraction by any means. As it turns out, this would not alleviate the problem, but likely worsen it. The market for data lies at the heart of enabling companies to create products and services that matter. It allows them to advertise things that are relevant to us. 81.3% of marketers describe data as important to their marketing efforts and 59.3% call it “critical” to the success of their campaigns. To stick with the oil analogy, data can be turned into kerosene that illuminates the market landscape for companies so they can navigate to producing things that we like. Data helps keep prices down since companies can reach the people that their product matters most to without having to waste the majority of their ad budget on delivering their content to people that have no interest in it, effectively littering your Facebook feed with irrelevant ads. Those savings are partly passed on to consumers and partly spent on R&D to make higher quality products and continuously improve products and services. The market for data is of utmost importance to the functioning of our modern and rapidly evolving economy. If people tried to shield their data from the companies that effectively use it, the innovation cycles we have gotten so used to would be slowed down drastically. Due to the importance of the market, there will always be a business in trying to gather whatever data is available. Trying to reduce data availability would likely only make the methods by which it is being extracted and made available more intrusive and detrimental. Think of the war on drugs and the failed positive implications drug regulations have had on the market for drugs. Similar dynamics can be expected in the market for data.

So if simply shielding our data from data brokers is not a feasible option, how do we get those companies to stop unrightfully extracting our data right from beneath our feet? There are two ways in business to compete with rivals: offering products that have lower prices or better quality. Lower prices would introduce a paradigm where people offer the same type of data that brokers are currently offering but at a fraction of the cost. The economics of this model would imply that the payoff per user would be marginal and the only tangible value proposition left for users is to effectively put the companies disowning them out of business.

Higher quality on the other hand means that companies can leverage the new data that is made available to them to produce even better products, innovate faster, and price them even more competitively by being able to reach target customers more effectively. For consumers, offering higher quality data also brings with it the ability to charge higher prices. This model would imply that next to claiming ownership of their data and putting data brokers out of business, personal users would have another highly appealing value proposition: a new revenue source. Now the question arises what would happen if every person in the US all of a sudden decided to become his or her own data broker and sell their data themselves?

The back-of-the-envelope math

As mentioned above, the last confirmed number about the size of the data brokerage market is $156 billion in the US in 2012. How much of that is solely generated by selling people’s personal data is not entirely clarified. But if we dive deeper into the annual report of one of these data brokers, Acxiom, we find that 79% of their earnings comes from revenues collectively titled as “Marketing Database Services” which the company defines as “solutions that unify consumer data across an enterprise, enabling clients to execute relevant, people-based marketing and activate data across the marketing ecosystem.” Let’s, for the ease of an approximation, assume that this is roughly equivalent to the cash flow data brokers generate by selling people’s data, without any additional service markup. That would leave us with a total serviceable market of $123 billion for the year 2012.

Next, let’s take a look at how much data each age group creates — since we’re only considering people who can engage in a legal contract by themselves, we’ll leave out the group under 18. As shown below, people in the bracket of 18–29 creates about 1/3 of all online data.

Breaking down the SAM figure mentioned above by the respective data creation per age bracket, we find that Americans between the ages of 18–29 could have made almost $1,000 in 2012.

A very important aspect to appreciate is that data is an exponentially growing asset class. 90% of all data has been created in the past two years. If we do a simple forecast of the data brokerage market based on a weighted growth factor collated from research presented by EMC and the growth rates of publicly listed data brokers (Acxiom, Equifax, Experian, and Alliance Data Systems) we find that this market stands to grow about 27% per year.

Running the numbers on this scenario, it turns out that the average American citizen would make roughly $1,600 per year per person. That is a 2.8% increase to the median household income. Due to their relatively higher internet consumption, millennials between 18 and 29 would make $2,600 per year if they took data brokerage into their own hands. In the year 2022, this number will have grown to an average of $6,700 per American internet user per year per person, with millennials standing to make around $10,900 per person.

In Conclusion

Data Brokerage might be one of the unsexiest industries on earth. Its concept is inherently abstract since we can neither see the entirety of the data we create, nor the mechanisms firms use to extract our data, or the products which it is turned into. But data brokerage is one of the most pressing socioeconomic issues of our time and requires our immediate attention. By leaving this industry to operate in the shadows, we are compromising our digital identity day by day. And by thinking of data as a public rather than private good, as manifested through incorrect analogies, we legitimize precisely this industry to prosper.

Data Brokerage is certainly not an issue that can be fixed overnight. It will be an incremental process where everyone has to chime in to make this change happen. But it is an issue that can be resolved if we take it into our own hands.

Get the Data Digest in your inbox