I don't even think you would be able to do this, in any meaningful way, if you're talking about NVDA in continuous speech mode reading sentences.  If I list were being read, where there was sufficient time to present and remove each word in caption, it could work.  It's not going to work at any speech rate that approximates natural speech.  You couldn't even consciously register individual words that would flash that briefly.  It would be like the experiments decades ago on subliminal suggestion.

