Synthetic data as an alternative to real-world data in research, development and innovation

We recently published an introductory blog post about the possibilities and current challenges related to the use of synthetic data (click here to read the original blog post in Finnish). Here, we summarize the main points in English:

  • Modern applications of personalized health need information on individuals (microdata), but the secondary use of health data is strictly regulated.
  • Synthetic datasets are created using mathematical models that replicate the main statistical properties of the real-world datasets. Adding some randomness to the process helps to protect privacy.
  • The flexibility and cost-effectiveness of synthetic datasets support especially small and medium-sized enterprises (SMEs) in building their own data-driven solutions, allowing them to enter the same markets as the data giants.
  • For synthetic datasets to become a mainstream practice, a number of regulatory issues calls for clarification. For example, instead of trying to define ‘anonymous’, a risk-based approach would be more useful.
  • The use of synthetic datasets appears a promising approach to alleviate some of the current challenges when it comes to accessing, using and sharing health data. However, multidisciplinary research is still needed to explore some of the fundamental questions such as privacy-utility trade-offs.

Read the full text from the blog of the Health Campus Turku (published 26.10.2021, in Finnish).