This could be made much easier,
What could be?

Everything you write after this is essentially what I proposed:  using the speech dictionary with regular expression matching to very strictly limit what is captured and substituted.

One of the beauties of regular expressions is how they can be crafted to catch only what you want.

