Our research topics

PRIVASA’s research teams and industry partners are exploring the potential and limitations of using synthetic data in health care. Synthetic data could support research and innovation activities when real data is not available due to legal, ethical or practical constraints. Synthetic data could also be used alongside real data to speed up the early phases of development.

For most applications in health care (mainly excluding primary care), it is critical that the datasets protect the individuals’ privacy. As a result, synthetic datasets are often expected to be anonymous. The process of generating synthetic data includes a well-known trade-off: synthetic datasets most similar to real data offer the least privacy protection. Likewise, synthetic datasets with little or no resemblance to real data offer strongest privacy protection.

By applying different techniques to generate synthetic datasets, we are mapping the practical implications of this privacy-utility trade-off. We are also empirically testing the usability of synthetic datasets in statistical analyses and comparing the performance to alternative solutions, such as private queries to real data.

In our research, we have focused on processing numeric tabular data and medical images.

Publications

We wish to acknowledge that scientific work exceeds the limits of individual projects.

The publications are listed here based on the co-authorship of one or more PRIVASA researchers (in bold). These represent research activities
planned and executed in the project or collaborative research activities
within the thematic scope of the project.

Eisenmann M., Reinke A., Weru V., Tizabi M. D., Isensee F., Adler, T. J., … Jafaritadi M., Kontio E., Khan M.,  … & Finzel R. (2022). Biomedical image analysis competitions: The state of current participation practice. arXiv preprint

Huhtanen J.T., Nyman M., Doncenco D., Hamedian M., Kawalya D., Salminen L., Sequeiros R.B., Koskinen S.K., Pudas T.K., Kajander S., Niemi P., Hirvonen J., Aronen H., Jafaritadi M. (2022). Deep learning accurately classifies elbow joint effusion in adult and pediatric radiographs. Scientific Reports 12, 11803. https://doi.org/10.1038/s41598-022-16154-x

Kaisti M., Laitala J., Wong D., Airola A. (2023). Domain randomization using synthetic electrocardiograms for training neural networks. Artificial Intelligence in Medicine. https://doi.org/10.1016/j.artmed.2023.102583

Khan M.I., Azeem M. A.,  Alhoniemi E., Kontio E., Khan S.A., Jafaritadi M. (2023). Regularized weight aggregation in networked federated learning for glioblastoma segmentation.  arXiv preprint

Khan M. I., Alhoniemi E., Kontio E., Khan S. A., Jafaritadi M. (2023) Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation. arXiv prerint

Khan M.I., Alhoniemi E., Kontio E., Khan S.A., Jafaritadi M. (2023). RegAgg: A Scalable Approach for Efficient Weight Aggregation in Federated Lesion Segmentation of Brain MRIs. 2023 Eighth International Conference on Fog and Mobile Edge Computing (FMEC). https://doi.org/10.1109/FMEC59375.2023.10306171

Khan M.I., Jafaritadi M., Alhoniemi E., Kontio E., Khan S.A. (2022). Adaptive Weight Aggregation in Federated Learning for Brain Tumor Segmentation. In: Crimi A., Bakas S. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. Lecture Notes in Computer Science, vol 12963. Springer, Cham. https://doi.org/10.1007/978-3-031-09002-8_40

Marelli L., Stevens M., Sharon T., Van Hoyweghen I., Boeckhout M., Colussi I., Degelsegger-Márquez A., El-Sayed S., Hoeyer K., van Kessel R.,Krekora Zając D., Matei M., Roda S., Prainsack B., Schlünder I., Shabani M., Southerington T. (2023). The European Health Data Space: Too Big To Succeed?. Health Policy, 104861. https://doi.org/10.1016/j.healthpol.2023.104861

Montoya Perez I., Movahedi P., Nieminen V., Airola A., Pahikkala T. (2024) Does Differentially Private Synthetic Data Lead to Synthetic Discoveries? Methods of Information in Medicine. https://doi.org/10.1055/a-2385-1355

Movahedi P., Nieminen V., Perez I. M., Pahikkala T. and Airola A. (2023) Evaluating Classifiers Trained on Differentially Private Synthetic Health Data. IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS). https:/doi.org/10.1109/CBMS58004.2023.00313

Movahedi P., Nieminen V., Montoya Perez I., Daafane H., Sukhwal D., Pahikkala T., Airola A. (2024) Benchmarking Evaluation Protocols for Classifiers Trained on Differentially Private Synthetic Data. IEEE Access 12: 118637-118648. https://doi.org/10.1109/ACCESS.2024.3446913

Nieminen V., Pahikkala T.,
and Airola A. (2023) Empirical evaluation of amplifying privacy by subsampling for GANs to create differentially private synthetic tabular data. CEUR Workshop Proceedings. Link to publication.

Salmi J., Hermansson L.-L. (2022) Centralized or de-centralized data and algorithms in the Finnish health care infrastructure. 14th International Conference on eHealth.

Vaiste J. (2023) Ethical implications of AI-generated synthetic health data. 2023. HAL preprint ⟨hal-04216538⟩

Code

A differentially private GAN implementation to create synthetic tabular data: https://github.com/vajnie/privasa_dp_tabular_gan

Theses

Differentially private synthetic tabular data generation with a generative adversarial network and privacy amplification by subsampling
Valtteri Nieminen, University of Turku (2022)
https://urn.fi/URN:NBN:fi-fe2022082956643

Exploring Medical Image Data Augmentation and Synthesis using conditional Generative Adversarial Networks

Dorin Doncenco, Turku University of Applied Sciences (2022)
https://urn.fi/URN:NBN:fi:amk-202204074675

Automatic classification of cardiomegaly using deep convolutional neural network
Maral Hamedian, Turku University of Applied Sciences (2022) https://urn.fi/URN:NBN:fi:amk-2022112524037

Predicting the condition of age-related macular degeneration patients with long short-term memory [in Finnish]
Kaspar Kaasikoja, Turku University of Applied Sciences (2022)
https://urn.fi/URN:NBN:fi:amk-2022052712520