Home » Uncategorized » Competing with Big Data

Competing with Big Data

The recent process of “datafication,” also coined ”the rise of big data” (Mayer-Schönberger and Cukier, 2013) is explained by two simultaneous, recent technological innovations: first, the increasing availability of data, owing to the fact that more and more economic and social transactions take place aided by information and communication technologies which easily and inexpensively store the information such transactions produce or transmit; second, the increasing ability of firms and governments to analyze the novel big data sets. Einav and Levin (2014:3) write: “But what exactly is new about [big data]? The short answer is that data is now available faster, has greater coverage and scope, and includes new types of observations and measurements that previously were not available.”

In “Competing with Big Data” (working paper expected late summer), Christoph Schottmüller (University of Copenhagen) and I attempt to better understand what we call data-driven markets and to study associated competitive behavior and outcomes. Thereby, we focus on the consequences of datafication in the economic sphere and largely leave its effects on the social, legal, and political domains out of consideration.

Both policy makers and researchers working on digital markets have repeatedly underlined that the most relevant dimension of competition in such markets, e.g. search engines or online services, is not the quantity a firm produces or the price it charges but the innovation efforts it invests. Besides, to understand the long-term effects of competition in such markets (and to make predictions about their likely development), it would be necessary to employ dynamic models, which study behavior of firms over time. Therefore, we construct and analyze a dynamic model, where competing firms choose their innovation efforts again and again. The important feature of the model is that it incorporates indirect network effects that arise on the supply side of a market, via decreasing marginal costs of innovation, but that are driven by user demand.

Indirect Network Effects

Demand for the services of one provider generates data about its users’ preferences or characteristics (henceforth: user information), as a natural and machine-generated by-product of using the service. This user information, which Zuboff (2016) calls “behavioral surplus,” is private information of the provider who collected it and can be used to innovate by adapting the product better to users’ preferences, thereby increasing perceived quality in the future. Thus, higher initial demand reduces the marginal cost of innovation: it makes it cheaper to produce one additional unit of product or service quality, as perceived by users.

For exemplification of the indirect network effects, think of a search engine. If a user places a query and then clicks on the third link she is shown, the search engine – and only this one search engine – knows which links the user could choose from and which one she preferred, thereby revealing her preferences. This information can be used by the search engine’s algorithm the next time a user enters a similar search term, thereby improving this search engine’s quality, as perceived by users, over competitors who do not have user information.

Data-driven Markets

We define markets that are subject to these indirect network effects fueled by user information as data-driven markets. We show that such markets tip under very mild conditions, moving towards monopoly. That is, we expect one firm to be dominant and other firms to serve little niches or to exit the market. We also identify a strong first-mover advantage, that is, the first firm with a business model that makes use of user information is likely to win the entire market over time.

Shallow Incentives to Innovate

An important result of the model is that it shows the extent of innovation incentives of the competing firms once the market has tipped, that is, when one firm is dominant and the other firm(s) have only small (and shrinking) market shares: they are close to nil.

The reason is that a firm with very little demand that considers to invest a lot into research, even if it has the idea for a radical innovation, faces a dominant firm that has much lower marginal cost of innovation. Hence, the dominant firm would only have to wait for the small firm to invest – and then could invest itself, thereby delivering high quality at much lower cost and ruining the profitability of the smaller firm’s investment. This is why the small firm, foreseeing the threat of the dominant firm owning a lot of user information, will not invest further. The dominant firm, on the other side, understands that the small firm has no incentive to innovate – which is why the dominant firm itself, serving a large share of the market, is best off by also saving all investment expenses and not innovating further.

Connected Markets and the Domino Effect

Going a step further, we study under which circumstances a dominant position in one data-driven market could be used to gain a dominant position in another market that is (initially) not data-driven. We show that if the market entry cost are not too high, a firm that manages to find a “data-driven” business model can dominate any market in the long term. If the data on user preferences or characteristics on one market have some value in the innovation process in another market, we define the markets to be connected.

Consequently, if technology firms realize that user information constitutes a key input into the production of quality in data-driven markets, they need to identify connected markets, where these data can be used as well. In those follow-up markets, the same results as in the initial markets apply, suggesting a domino effect: a first mover in market A can leverage its dominant position, which comes with an advantage on user information, to gain a first-mover advantage in market B and let that market tip, too.

This result of our model suggests a race. On the one hand, technology firms with large stocks of existing data on user preferences and characteristics will be looking to identify data-driven business models utilizing these data stocks in other industries. On the other hand, traditional companies will be trying to increase data-independent product quality in order to make it prohibitively costly for those data-driven firms to enter their markets. Complementary, they will try to collect as much user information as possible themselves (and preemptively) in order to avoid losing the entire market once some firm identifies a data-driven business model and actually enters their market.

So what? – A Theory of Harm for Data-driven Markets

Finally, we study the normative implications of our results. Because a tipped market provides no incentives for firms of any size to innovate further, market tipping is negative for consumers in this market (this is, in legal jargon, our “theory of harm” in data-driven markets). It also deters market entry of new firms, even if they come along with a revolutionary technology.

Therefore, we analyze the effects of a specific market intervention that was recently proposed—by regulation, not competition law—in our model. Based on Argenton and Prüfer (2012), we study: what if firms with data-driven business models were legally required to share their (anonymized) data about user preferences or characteristics with their competitors?

Mandatory Data-sharing?!

Contrary to the claims of some commentators on that earlier paper, we show that a first mover’s incentives to innovate further do not decline after such forced sharing of user information, even in a dynamic model. Instead, we show that with data sharing (voluntary, or not), data-driven markets do not tip, that is, the level of competition – and, hence, the level of innovation – in these markets remains high, which benefits consumers.

The intuition is that with mandatory data sharing, both competitors face the same cost structure; a firm with initially higher demand does not have a comparative advantage in producing quality. As a result, the sharing of user information avoids the negative consequences for innovation that are specific to data-driven markets.

As a caveat, these markets can still be dominated by one or a few firms, just as any other market. But in that case, we could be more confident that the source of dominance is a fundamentally superior sales proposition and not a windfall innovation cost-reduction from earlier success in the market.

[This post was originally written for the blog of the Data Science Center Tilburg.]


  • Argenton, Cédric and Jens Prüfer. 2012. “Search Engine Competition with Network Externalities,” Journal of Competition Law & Economics 8(1): 73-105.
  • Einav, Liran and Jonathan Levin. 2014. “The Data Revolution and Economic Analysis,” in: Josh Lerner and Scott Stern (eds.), Innovation Policy and the Economy 14(1), NBER Books, National Bureau of Economic Research: 1-24.
  • Mayer-Schönberger, Viktor and Kenneth Cukier. 2013. “The Rise of Big Data,” Foreign Affairs, May/June Issue.
  • Zuboff, Shoshana. 2016. “The Secrets of Surveillance Capitalism,” Frankfurter Allgemeine Zeitung, March 5, 2016.