The sensitivity of health data is a significant obstacle for major healthcare and medical breakthroughs. Data can not be shared because of e.g. GDPR and other regulations. The existing solutions to the problem are different data anonymization techniques. However, these methods lose more information than what is necessary, do not really support use cases where data is coming from many sources and fit very badly to real-time data. We have developed and validated a neural network-based model that is massively faster than other techniques, supports various data types and preserves better data quality. We collaborate with leading pharma and diagnostics companies as well as providers of information systems (EHRs and data lake solutions) to hospitals.
The core team of VEIL.AI has been managing data for the biggest Finnish and European medical research and innovation projects. A good example is a recent EU-funded project where researchers and companies collaborate to find new diagnostic and therapeutical solutions to a wicked rare disease. Study budget is about euros 15M, more than ten countries and patient cohorts are involved. There would be a need to collect new primary data and link that to existing registry data. Out of the 5 years of the study duration, more than two years lawyers discussed about data sharing and anonymization principles. Half of the study time! And what is even more frustrating – after the agreement the existing methods are losing much more of the information than what would be necessary. Furthermore, there would be a need to include complicated data types – imaging data, genome data and some NLP data. However, they cannot be properly anonymized with the existing techniques.
The technical team wanted to solve these problems. And they did. They developed a neural network based model, which makes INSTANT data sharing possible (i.e. data is anonymized within the firewalls of each organization), it suits well to complicated multimodal data (including imagin and genome data and NLP). Moreover, it solves the well-known wicked problem of how to anonymize real-time data. And finally, a lucky by-product of the chosen approach is that it can be used not only to anonymization of data but also in production of synthetic data.