Fun with Language
Working with natural language can be challenging and frustrating, but it can be a lot of fun too. Let me share my newest “fun with language” moment:
Research is often done in English, and there are many great tools (e.g. NLTK and Stanford NLP Group) which you can use when working on English NLP projects. In addition, English is easier compared to German for NLP – if you want an example, have a look at my earlier post “Can you come up with another one?” about compound words.
If I could somehow take advantage of the English tools and apply them to improve my NLP pipeline to process radiology reports, that would be amazing! One possibility would be to use English medical ontologies and link them with our German medical ontology. That could be done by looking up the names of the pathologies and body parts (e.g. disk herniation) in a dictionary and trying to match them between English and German ontology. Sounds reasonable? Well, we asked Google to translate “disk herniation” to German and received “Festplattenvorfall.” For all German speaking readers: you may snigger quite a bit – we did too. Why are all German speakers sniggering? “Festplattenvorfall” means an “incident of the hard drive.”
Let me try to come up with an explanation of what happened: The English word “disk” (or “disc”) is ambiguous. It refers to a thin, flat, circular object and usually either means “the hard drive component of a computer” or “an intervertebral disk,” depending on the context. When translating, the first version “hard drive” probably more commonly used (in the “whole world” compared to “medical world”) and—according to Google Translate—has therefore a higher probability of being the intended usage.
The second word “herniation” was rightfully translated to “Vorfall” – but only in the context of the intervertebral disk. It is probable that each word was translated for itself – without the context of the other word – and at the end “put together” to build a German composita. Automatic translation is a nice and often helpful tool, but treat with caution – especially while translating in a specific domain like medical terms.