In my previous blogs, I talked about the difficulty provided by natural language (e.g. ambiguity and variety) and the medical domain (e.g. medical words unknown in common tools). Today, I want to show you another aspect: How big the impact of a small “error” in the model can be.

Let me start by telling you about my latest challenge at work.

I was introducing a new sentiment (“minor”), which should be applied to mentions like “small disk herniation.” There are different use cases when such a finer sentiment granularity can be useful. For example, there are different levels of stenosis of the spinal canal – if we nonphysicians look at the MRIs, the scale is from “Ok, I see that this is probably not optimal” to “My back hurts from looking at it.” Detecting these differences is important for accuracy in future work.

To add a new sentiment, I might give the following instruction to the pipeline: “If this word occurs, the sentiment is minor.” Then, I add a few keywords like “small” and “minimal.” and it works perfectly fine in the test cases. However, you may notice that I wrote “test cases,” because when I ran it over all the actual reports, I noticed that the pipeline was not able to identify as many “minor” pathologies as I would have expected. That raised some red flags and led me to conduct a closer inspection of the sentiment analysis to find where the bug was hidden.

To help understand the issue, you need to keep the following in mind:

  • The minor sentiment is based on adjectives, which are written in lowercase in German.
    (Only nouns are written in uppercase.)
  • In medical reports, they often write in nominal sentences. For example, “Small disk herniation” is one such nominal sentence which occurs quite often. These sentences often seem more like items on a list rather than sentences and are difficult to handle with common NLP tools, which expect perfectly written sentences.
  • If a word begins a sentence, it is always written in uppercase (e.g. “Small” instead of “small”).

The sentiment detection based on keywords looks at the “dictionary form” (lemma) of words. For example, a noun is always lemmatized to the singular form (e.g. the lemma of “disks” is “disk”). After a painstaking search for the cause of the bug, I found out that the lemmatization for German is only based on the word as it is – information about part of speech tag (e.g. it is an adjective) or sentence start is not part of the lemmatizer. Now remember our nominal sentence example and that adjectives are written in lowercase in German – except that at the sentence start everything is uppercase. If the lemmatizer doesn’t know about sentence start, or the part-of-speech tag of the word, the occurrence “Small” will be viewed as a noun. Funny enough, in German, the adjective “small” is “klein”, and with declination for disk herniation it is “kleine”. At a sentence start, it will be written as “Kleine,” which can also be a noun in German meaning “little girl.”

Long explanation short, the most often used keyword for minor sentiment was in most cases incorrectly lemmatized, and therefore the sentiment detection failed. This is an “error” for our pipeline, but in general, with the way the lemmatizer is programmed to function, it is not a bug at all but perfectly correct.

After finding the cause of the bug, it was easy to fix by adding simple rules, such as: “If an adjective is uppercase, re-run the lemmatizer with the lowercase version of the adjective.” Because we deal with medical terminology, we also added rules for unknown adjectives to remove suffixes of declination and comparison.

Speaking of medical terminology, let me finish this post by sharing with you my favorite “unknown word” issue I saw during my search: the German word “Wirbelkörper,” which is a compound word composed of “vertebra” and “body.” This word is unknown to the general model we are using, including during part-of-speech tagging. Because of that and/or other unknown reasons, “Wirbelkörper” has a high probability to be tagged as an adjective – but other medical terms are usually labeled as “proper noun” or “unknown.” Whatever the reason is for this tagging, now it is lemmatized to “Wirbelkörp”... which reminds me to have a look into the part-of-speech tagger too.