Online search engines are used by billions of users every day. They offer the basic infrastructure for many other industries and are, therefore, of very high economic, political, and social importance. Over the past few years, an intense policy debate has formed around the question: do some search engines produce better search results because their algorithm is better, or because they have access to more data from past searches?
In the former case, it may be best to refrain from interventions in the market in order not to stifle the innovation incentives of successful entrepreneurs (and their potential contestants). In the latter case, mandatory data sharing of user-generated data, a policy that is currently discussed and already contained in the EU’s Digital Markets Act, could trigger innovation and would benefit all users of search engines.
Together with Tobias Klein, Madina Kurmangaliyeva, and Patricia Prüfer, and asked by the German Finance Ministry, I engaged on a journey to produce relevant data and inform the policy-making process. The resulting paper, “How important are user-generated data for search engine quality? Experimental results”, has now been accepted for publication by the Journal of Law & Economics.
In this paper, we report results from a collaboration with a small search engine, Cliqz. They provided us with non-personalized search results for a random set of queries and conducted an experiment on our behalf. This offers within-search engine comparisons. We complemented the Cliqz data with non-personalized search results from Google and Bing on the same queries in the same period in the same country and asked external assessors to assess the quality of the search results on a 7-point Likert scale (not mentioning the origin of the results). This offers insights about between-search engine comparisons.
We find robust evidence that differences in the quality of search results are explained by searches for less popular search terms, for which a search algorithm can rely on less data. The insights are complemented by results from an experiment, in which we keep the algorithm of the search engine fixed and vary the amount of data it uses as an input. This offers causal evidence that more user data on rare queries enables search engines to produce better quality. Notably, 74% of the traffic in our data come from “rare queries”. Hence, this is the relevant dimension of competition, where a search engine must perform in order to attract users.
Our results show that the mandatory sharing of user data may be an appropriate remedy in the sense that it would likely allow entrants, such as Cliqz, to successfully compete with the incumbent (Google) by enabling Cliqz to provide search results that are also of high quality for rare queries. Unlike in other contexts, this remedy does not directly harm the incumbent, as it makes use of the non-rivalry of information: the incumbent will still be able to use the same data. Only the exclusivity of data access would be reduced. Consequently, users would benefit.