A year ago, a group of TILEC researchers (combining expertise in economics, law, and econometrics and teaming up with CentERdata’s competence in data science and consumer research) was charged by the German Federal Ministry of Finance (BMF) to develop a suitable indicator for the identification and delineation of data-driven markets and, based on this, approaches to data governance. In particular, the task was to develop a methodology for measuring the data-driven nature of a market (i.e., a test for data drivenness) and the market dominance of individual providers, to apply this procedure in a selected industry, and to explore a suitable data governance structure and possible regulatory implementation.
With recent and ongoing progress in the legislation and regulation of data-based industries, both at national and EU-levels (e.g. the Digital Markets Act and Digital Services Act), this project has become even more topical.
The German Tagesspiegel, a daily newspaper, has already reported about the results. The full research report (in German) is here. The first openly accessible research document is a working paper, titled “Governance of Data Sharing: a Law & Economics Proposal” (joint with Inge Graef). There is more to come.
The main results of the BMF-project are as follows:
- The developed econometric test for data-driven markets follows the basic question: how long does it take a provider who starts without user-generated data on user preferences and characteristics and hypothetically “does everything right” to catch up with the competitor with the largest market share? If the answer is “less than 3-5 years,” a market is not (sufficiently) data-driven. If the answer is “longer than 5 years,” then the market is data driven. In the latter case, the feedback loop by which having more access to data leads to higher quality, which necessarily increases the market leader’s market share, is very strong. Without regulatory intervention, there is then no hope of a change in the market structure. This has a negative impact on the incentives for innovation of both potential market entrants and the market leader. Due to the great market power of the dominant provider, it leaves room for multiple abuses to the detriment of users/consumers.
- The test for data-driven markets consists of two parts: the assessment of the role of different features in shaping the demand of users and the assessment of the quality feedback loop. To illustrate its use in practice, the test for data drivenness was applied in the market for internet search engines. There, a discrete-choice experiment with 821 participants showed that both a reduction in the quality of the search results and an increase in the number of ads and the degree of personalization of the search engine have a significantly negative effect on user satisfaction. The negative evaluation of personalization implies a preference for the protection of their privacy. However, we found that respondents rated quality approximately twice as highly as the other two characteristics, personalization level and advertising (each on a 5-level scale). This shows the dominant importance of search engine quality compared to other product characteristics for user satisfaction (and therefore demand) in this market.
- Furthermore, the results show significant interactions of the degree of personalization with both the type of search query and the degree of transparency. The negative effect of the degree of personalization on user satisfaction was significantly stronger for a health-related search query than for a harmless search query — and significantly stronger if the privacy information was transparent (and not hidden).
- In an experiment with the search engine Cliqz from Munich, the amount of user-generated data to which the search algorithm had access to in order to answer a user’s search query was artificially varied. It showed that giving a small search engine access to more user-generated data would greatly improve its search quality. This is especially true for rare search queries, regardless of the exact measure of search quality. For these more than 70% of all search queries, no quality saturation could be determined through access to more and more user-generated data. Human evaluators of the search results qualitatively confirmed these results based on machine-calculated quality measures of search engines: More user-generated data lead to higher quality for rare search queries.
- In summary, the test for data drivenness shows a clear result: the search engine market is data driven. With significantly less user-generated data than the leading search engine, it is impossible to achieve a market share on this market that comes close to the market leader, even in the medium term. Therefore, this market is not competitive.
- With regard to an appropriate governance structure for mandatory data sharing, we found that the existing legal mechanisms for enforcing a data-sharing obligation under EU competition law and for facilitating data portability under the GDPR are not sufficient.
- In any data-governance structure, regulators must perform three essential tasks: investigating potentially data-driven markets (i.e., performing the test for data-drivenness), deciding whether a market is data driven and exactly which data must be shared by whom, with whom, in what way (that is, evaluating the test result), and technically implementing and legally enforcing the data sharing obligation.
- Due to institutional limitations resulting from the EU Treaties, the design of the data-sharing obligation requires a governance structure that combines elements of an economically efficient centralization with a legally necessary decentralization of data sharing. Our analyses show three feasible governance structures:
- Relatively centralized: The investigation of a potentially data-driven market and the enforcement of the data-sharing obligation will be centralized in a new European Data Sharing Agency (EDSA), while the joint decision-making power of the national competition authorities will lie with a supervisory body.
- Decentralized: A Data Sharing Cooperation Network (DSCN) will be established, coordinated by a European Data Sharing Board, which will include the presidents of all 27 national competition authorities. The DSCN decides on the data-driven nature of a market. The national competition authority best placed to investigate a potentially data-driven market acts as the lead national competition authority (so-called Lead NCA), which investigates and enforces the data-sharing obligation throughout the EU.
- Mixed: The national competition authorities are charged with investigation (Lead NCA) and decision making (DSCN). The centralized EDSA is responsible for the enforcement of the data-sharing obligation.
Existing enforcement approaches in data protection and consumer law have already demonstrated the feasibility of such arrangements. By incorporating data protection and intellectual property considerations into the governance design itself, the governance structures proposed here offer a concrete approach to future data regulation that combines legal and economic insights and can be easily taken up by policy makers.
The report leads to the following policy implications:
- In data-driven markets, competitors of a dominant firm have no chance without political intervention to achieve a market share close to that of the market leader in the medium term. Therefore, we recommend the creation of new legal tools for regulating data-driven markets. Specifically, we recommend the introduction of mandatory data sharing of user-generated data.
- Because the market for search engines is data-driven (see result 5 above), we recommend the introduction of a data sharing obligation for user-generated data in this market.
- Regardless of a specific market, we recommend the following design principles for mandatory data sharing:
- Only raw data should have to be shared, which can be stored almost free of charge by the provider via the automated storage of the interaction between user and provider. The analysis of this data is the responsibility of each recipient. In the search engine market this corresponds to search log data.
- In a data-driven market, all providers with a market share of at least 30% should be obliged to share their user-generated data. This results in a maximum of three providers per market that have to share data. This number decreases the more the market is monopolized.
- On the receiving side, any organization that is active in the respective market or that can explain how it would serve the users of this market with the data should be given access to the shared data. This should apply regardless of the organizational form of the receiving party, that is, both to for-profit, non-profit and public organizations.
- On the one hand, our analysis of the available mechanisms of competition and data protection law shows that these are not sufficient to avoid monopolistic tendencies in data-driven markets. On the other hand, all three proposed options for data governance (see result 8 above) already take into account the limitations imposed by data protection and intellectual property protection. We therefore recommend implementing one of the three governance options, including newly created institutions and communication channels.
- When trading off the pros and cons of centralized and decentralized governance, we see an advantage in the “mixed” governance structure: the technical infrastructure required to enforce the data-sharing obligation does not need to be duplicated between national competition authorities, as this takes place at EU level within the EDSA. At the same time, there is no need to create new investigative and enforcement powers at EU level, as the national competition authorities select a lead national competition authority that is best placed to take over a particular case. The NCAs thus share the burden of using the resources within the DSCN. Due to this combination of features, we regard the “mixed” governance structure optimal and recommend this option.
- For efficiency, data security and privacy considerations, we recommend that user-generated data is not forwarded to organizations with recipient rights, but rather that it is consolidated and shielded in a data pool, operated by the Lead NCA/EDSA’s technology department. Organizations that have a right to access the shared data should be given the opportunity to have their ML algorithms trained in the pool. Only the algorithms of the receiving companies — and no human being — get access to the raw data, but cannot take it out of the data pool. Instead, they can only transfer the findings from their analyses to the outside world, where a multitude of providers can now compete with each other in a meaningful way.
 This means that the provider makes the most user-friendly decision regarding all product features that influence user satisfaction (even if it costs her/him revenues in the short term).