To effectively train radiologists and the artificial intelligence algorithms that assist them in detecting and diagnosing specific pathologies, a large data set is needed. Typically, training sets of data must be obtained from several sources, usually various hospitals. But how do radiological companies and their affiliates protect patient privacy and confidentiality?
Hospitals and medical providers are gatekeepers
The volume of healthcare data is growing exponentially, even as people become more wary about protecting their personal information. To maintain the public’s trust in the healthcare establishment as a whole, maintaining confidentiality of patient medical data is critically important. Therefore, hospitals and health care providers are held to exacting standards by federal regulatory agencies. In the U.S., HIPAA in particular makes it difficult for providers to share patient data. The EU Data Protection Directive states that personal information can only be “collected for specified, explicit and legitimate purposes and not further processed in a way incompatible with those purposes.” Data breach is costly, so there is a compelling motivation for hospitals to protect their databases.
Anonymization is multi-faceted
When hospitals do agree to share data, they must anonymize it. Sensitive personal data, such as name, address, social security number, and other direct identifiers are masked. Direct identifiers may be removed entirely from the dataset and replace with random values, such as hashmarks or X’s, or in certain cases, they may be encrypted so that a password or key is required to grant access to an individual. After the data is masked, secondary healthcare organizations can use the radiological images for training and research purposes without compromising individual patient privacy.
Certain data may also be de-identified. De-identification protects fields that delineate an individual’s demographic and socioeconomic information; these indirect identifiers can include age, race, income, etc. Depending on the purpose of the secondary research, some of this information may be left. For example, in order to approve a new medical device, the FDA requires that it be tested on a variety of ages, sexes, and races. Therefore, leaving some data fields may be necessary for particular kinds of medical research and data set training.
In some cases, data may be “obfuscated,” a technique in which potentially identifying data is masked, but the relationships between relevant components (e.g., patient zip code) are preserved by translating the data into some other metric. For example, if researchers wanted to track the correlation of a particular condition with a certain region or group of area codes, the data could be made protect individual privacy by inserting random “noise” while still maintaining statistical integrity. This is a technique known as “data perturbation,” and is regarded as a relatively easy and effective method in protecting sensitive electronic data.
Advantages and challenges in export of anonymized medical data
After the data is anonymized, it is referred to as secondary data. This secondary data can then be exported to outside entities for research and training purposes. Even anonymized, data must still be diligently guarded, so cybersecurity to protect data stored in the cloud is required.
The protocol for MRI data export has long been standardized among vendors. Exported MRI data is stored in a Picture Archiving Communication System (PACS) using the Digital Imaging and Communication in Medicine (DICOM) formatting and transmitting protocol.
While medical imaging data storage infrastructure has become quite streamlined, a major challenge with the anonymized data export is that different hospitals and providers have different systems for configuring and storing all the data related to a patient. Different facilities have different Radiological Information System (RIS) for keeping track of patient data, and some use a Clinical Information System (CIS), which may tie together several software solutions, including the RIS. Utilizing these software solutions is highly advantageous for coordinating and communicating patient data within an organization. However, things can get tricky with respect to data anonymization when radiological images are stored in a separate system from the accompanying radiological report because once the data is anonymized, the two items can no longer be matched.
In short, careful tracking and protection of sensitive data is and will continue to be a vital component of high-quality healthcare and medical research and development.