You are generally right in your analysis but completely wrong in ML/NLP. You are committing what I called a fallacy fallacy in another paper. In effect, your notion of background knowledge is routinely incorporated in many NLP systems. You proof is only relevant to the fictional straw man you have created. Forever (as long as I have been doing this which is at least 40 years) we have understood your information theoretic argument.
Suggest you read this to see the "fallacy fallacy" you have fallen into.
https://medium.com/liecatcher/off-on-mark-solms-3d87462ab327
The term "compression" is misleading and overly simplistic since it is not physically possible to express the entire message being communicated:
You have no idea, nobody does, what the true information content is of any communicated sentence. Your proof always holds, but your specific numbers are completely wrong.
Furthermore, the evolution of language is not correct. Animal communication by signs have always used compression. The evolution is about specific content...which is, by the way, in the aggregate less compression.
https://medium.com/liecatcher/the-natural-evolution-of-human-lies-655e983ee6c6