Our research topics
- Synthetic data generation
- Differential privacy
- Legal and ethical perspectives on anonymity and secondary use of health data
PRIVASA’s research teams and industry partners are exploring the potential and limitations of using synthetic data in health care. Synthetic data could support research and innovation activities when real data is not available due to legal, ethical or practical constraints. Synthetic data could also be used alongside real data to speed up the early phases of development.
For most applications in health care (mainly excluding primary care), it is critical that the datasets protect the individuals’ privacy. As a result, synthetic datasets are often expected to be anonymous. The process of generating synthetic data includes a well-known trade-off: synthetic datasets most similar to real data offer the least privacy protection. Likewise, synthetic datasets with little or no resemblance to real data offer strongest privacy protection.
By applying different techniques to generate synthetic datasets, we are mapping the practical implications of this privacy-utility trade-off. We are also empirically testing the usability of synthetic datasets in statistical analyses and comparing the performance to alternative solutions, such as private queries to real data.
In our research, we have focused on processing numeric tabular data and medical images.
Publications
We wish to acknowledge that scientific work exceeds the limits of individual projects.
The publications are listed here based on the co-authorship of one or more PRIVASA researchers (in bold). These represent research activities
planned and executed in the project or collaborative research activities
within the thematic scope of the project.
Eisenmann M., Reinke A., Weru V., Tizabi M. D., Isensee F., Adler, T. J., … Jafaritadi M., Kontio E., Khan M., … & Finzel R. (2022). Biomedical image analysis competitions: The state of current participation practice. arXiv preprint
Huhtanen J.T., Nyman M., Doncenco D., Hamedian M., Kawalya D., Salminen L., Sequeiros R.B., Koskinen S.K., Pudas T.K., Kajander S., Niemi P., Hirvonen J., Aronen H., Jafaritadi M. (2022). Deep learning accurately classifies elbow joint effusion in adult and pediatric radiographs. Scientific Reports 12, 11803. https://doi.org/10.1038/s41598-022-16154-x
Kaisti M., Laitala J., Wong D., Airola A. (2023). Domain randomization using synthetic electrocardiograms for training neural networks. Artificial Intelligence in Medicine. https://doi.org/10.1016/j.artmed.2023.102583
Khan M.I., Azeem M. A., Alhoniemi E., Kontio E., Khan S.A., Jafaritadi M. (2023). Regularized weight aggregation in networked federated learning for glioblastoma segmentation. arXiv preprint
Khan M. I., Alhoniemi E., Kontio E., Khan S. A., Jafaritadi M. (2023) Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation. arXiv prerint
Khan M.I., Alhoniemi E., Kontio E., Khan S.A., Jafaritadi M. (2023). RegAgg: A Scalable Approach for Efficient Weight Aggregation in Federated Lesion Segmentation of Brain MRIs. 2023 Eighth International Conference on Fog and Mobile Edge Computing (FMEC). https://doi.org/10.1109/FMEC59375.2023.10306171
Khan M.I., Jafaritadi M., Alhoniemi E., Kontio E., Khan S.A. (2022). Adaptive Weight Aggregation in Federated Learning for Brain Tumor Segmentation. In: Crimi A., Bakas S. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. Lecture Notes in Computer Science, vol 12963. Springer, Cham. https://doi.org/10.1007/978-3-031-09002-8_40
Marelli L., Stevens M., Sharon T., Van Hoyweghen I., Boeckhout M., Colussi I., Degelsegger-Márquez A., El-Sayed S., Hoeyer K., van Kessel R.,Krekora Zając D., Matei M., Roda S., Prainsack B., Schlünder I., Shabani M., Southerington T. (2023). The European Health Data Space: Too Big To Succeed?. Health Policy, 104861. https://doi.org/10.1016/j.healthpol.2023.104861
Montoya Perez I., Movahedi P., Nieminen V., Airola A., Pahikkala T. (2024) Does Differentially Private Synthetic Data Lead to Synthetic Discoveries? Methods of Information in Medicine. https://doi.org/10.1055/a-2385-1355
Movahedi P., Nieminen V., Perez I. M., Pahikkala T. and Airola A. (2023) Evaluating Classifiers Trained on Differentially Private Synthetic Health Data. IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS). https:/doi.org/10.1109/CBMS58004.2023.00313
Movahedi P., Nieminen V., Montoya Perez I., Daafane H., Sukhwal D., Pahikkala T., Airola A. (2024) Benchmarking Evaluation Protocols for Classifiers Trained on Differentially Private Synthetic Data. IEEE Access 12: 118637-118648. https://doi.org/10.1109/ACCESS.2024.3446913
Nieminen V., Pahikkala T., and Airola A. (2023) Empirical evaluation of amplifying privacy by subsampling for GANs to create differentially private synthetic tabular data. CEUR Workshop Proceedings. Link to publication.
Salmi J., Hermansson L.-L. (2022) Centralized or de-centralized data and algorithms in the Finnish health care infrastructure. 14th International Conference on eHealth.
Vaiste J. (2023) Ethical implications of AI-generated synthetic health data. 2023. HAL preprint ⟨hal-04216538⟩
Final report
The project’s final report has been published as an open access article in AIMS Applied Computing and Intelligence.
Policy brief
PRIVASA Politiikkasuositukset [in Finnish]
Click here to download the pdf document
PRIVASA Policy brief [in English – TBA]
Code
A differentially private GAN implementation to create synthetic tabular data: https://github.com/vajnie/privasa_dp_tabular_gan
Theses
Differentially private synthetic tabular data generation with a generative adversarial network and privacy amplification by subsampling
Valtteri Nieminen, University of Turku (2022)
https://urn.fi/URN:NBN:fi-fe2022082956643
Exploring Medical Image Data Augmentation and Synthesis using conditional Generative Adversarial Networks
Dorin Doncenco, Turku University of Applied Sciences (2022)
https://urn.fi/URN:NBN:fi:amk-202204074675
Automatic classification of cardiomegaly using deep convolutional neural network
Maral Hamedian, Turku University of Applied Sciences (2022) https://urn.fi/URN:NBN:fi:amk-2022112524037
Predicting the condition of age-related macular degeneration patients with long short-term memory [in Finnish]
Kaspar Kaasikoja, Turku University of Applied Sciences (2022)
https://urn.fi/URN:NBN:fi:amk-2022052712520