When humans are talking or writing, they are not as precise as you would think. Usually, this is no issue because the meaning is implied by the context and if not, the other participants will enquire to achieve understanding. Problem solved, everyone is happy, conversation may continue. Now, imagine you are reading a medical report and there is a vague formulation: What can you do? You can ask a doctor for clarification or – if you are a radiologist yourself – interpret the images. Now imagine you want to interpret these reports automatically. You develop an artificial intelligence (AI) to extract the pathologies from the reports. It works just fine, until it encounters a case where the formulation is unclear. The AI cannot ask the radiologist, and it cannot check the MRI to get “additional context” – because the result of the label extraction is provided to other AIs to learn to identify the MRI images. This is a little bit of a chicken-egg issue.
Shall I give you an example?
What is the difference between
“non-significant disk herniation” and “no significant disk herniation”?
Same difference, right? But wait, it is not. Not if you think a little bit about it. A non-significant disk herniation could be understood as a small disk herniation we may ignore. But how shall we interpret “no significant”? It is not clear if there actually is a non-significant (e.g. a small) disk herniation or none at all. We only know, there is no big (significant) disk herniation. In addition, there is no definition of “significant.” It can mean “there is no disk herniation I can see, but perhaps there is a tiny one” but it could also mean “I can see a disk herniation, but I think it will not cause any issues for the patient.” Just by adding or removing the character “n” (and the hyphen), the whole meaning of the sentence can change! To make matters worse, significant can be replaced by similar words like relevant – and we are back to the creative ways of writing reports.
Besides the issue of “no significant pathologies”, we need to distinguish the sentiment of a pathology occurrence. For example, when we have an occurrence like “L1/2: drastic disk herniation” in the report. The sentiment of the pathology “disk herniation” is positive (not for the patient, of course). Especially for disk herniation, radiologists often write the following: “L1/2: no disk herniation.,” which is positive for the patient, but implies a negative sentiment for the pathology “disk herniation” – no evidence in the MRI that a disk herniation is present between vertebrae L1 and L2.
It still sounds like the sentiment analysis for pathologies is doable: find the pathology, check for negations and exclude “no significant” because it is too vague. We data scientists briefly thought the same. Then we found out about “adjectives” – you know, the ones that make a house nice and a baby cute. In pathology reports, adjectives (which appear entirely too often!) can change everything and further obfuscate interpretation. For instance:
- No drastic <pathology>
We were already confused because of “no significant”, but “no drastic” is even worse: how is drastic defined and what is a non-drastic issue?
- No small <pathology>
No small disk herniation? That is clear enough, but what about a big one?
- No real/actual <pathology>
Can someone please explain what an unreal pathology is?
- No new <pathology>
Great, but what about the old one? Is it still there or not?
And this list goes on and on and on… Based only on the reports, we data scientists cannot resolve those ambiguous formulations automatically. In our specific domain of analyzing MRIs and with our goal to automatically detect pathologies, it is important that if we label an MRI with a specific pathology (e.g. disk herniation), we are as certain as possible that the sentiment (positive or negative) we attach is correct. The algorithms to identify pathologies in new MRIs can only be as good as the provided learning data – if the provided labels from the reports have a bad quality, the identification will never be good. Therefore, we had to introduce the sentiment “unclear” for all the cases where we are not sure whether the pathology is really represented in the MRI or not. This leads to a huge list of “unclear phrases,” which we must then evaluate manually and decide if a general rule could be applied. For example, “no further disk herniation” is a common phrase at the end of the report. It implies, there is a disk herniation somewhere – which was mentioned – but all the other disks are herniation-free.
Natural language provides a lot of flexibility, variety and therefore vague formulations. Many are cleared up with a little bit of context, or perhaps the small distinctions are not that important. But for us data scientists, this is not “same difference” but a huge issue.