Our research topics
- Synthetic data generation
- Differential privacy
- Legal and ethical perspectives on anonymity and secondary use of health data
PRIVASA’s research teams and industry partners are exploring the potential and limitations of using synthetic data in health care. Synthetic data could support research and innovation activities when real data is not available due to legal, ethical or practical constraints. Synthetic data could also be used alongside real data to speed up the early phases of development.
For most applications in health care (mainly excluding primary care), it is critical that the datasets protect the individuals’ privacy. As a result, synthetic datasets are often expected to be anonymous. The process of generating synthetic data includes a well-known trade-off: synthetic datasets most similar to real data offer the least privacy protection. Likewise, synthetic datasets with little or no resemblance to real data offer strongest privacy protection.
By applying different techniques to generate synthetic datasets, we are mapping the practical implications of this privacy-utility trade-off. We are also empirically testing the usability of synthetic datasets in statistical analyses and comparing the performance to alternative solutions, such as private queries to real data.
In our research, we have focused on processing numeric tabular data and medical images.

Publications
We wish to acknowledge that scientific work exceeds the limits of individual projects.
The publications are listed here based on the co-authorship of one or more PRIVASA researchers (in bold). These represent research activities
planned and executed in the project or collaborative research activities
within the thematic scope of the project.
Huhtanen J.T., Nyman M., Doncenco D., Hamedian M., Kawalya D., Salminen L., Sequeiros R.B., Koskinen S.K., Pudas T.K., Kajander S., Niemi P., Hirvonen J., Aronen H., Jafaritadi M. (2022). Deep learning accurately classifies elbow joint effusion in adult and pediatric radiographs. Scientific Reports 12, 11803. https://doi.org/10.1038/s41598-022-16154-x
Khan M.I., Jafaritadi M., Alhoniemi E., Kontio E., Khan S.A. (2022). Adaptive Weight Aggregation in Federated Learning for Brain Tumor Segmentation. In: Crimi A., Bakas S. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. Lecture Notes in Computer Science, vol 12963. Springer, Cham. https://doi.org/10.1007/978-3-031-09002-8_40
Theses
Differentially private synthetic tabular data generation with a generative adversarial network and privacy amplification by subsampling
Valtteri Nieminen, University of Turku (2022)
https://urn.fi/URN:NBN:fi-fe2022082956643
Exploring Medical Image Data Augmentation and Synthesis using conditional Generative Adversarial Networks
Dorin Doncenco, Turku University of Applied Sciences (2022)
https://urn.fi/URN:NBN:fi:amk-202204074675