On Mon, 29 Mar 2021, Brian Vogel wrote:
----- Hence the reason I object to the dictionary processing not dropping out upon first match. That result is insane, and it's something that no rationalWhile I agree that the result is not intuitive, I think it is necessary.
The only alternative I can see to this "match then continue with further matches" method, is to "match then start over and look for further matches".
That has infinite loop potentials I don't want to consider.
This is supposition on my part, but I believe the reason it continues searching the dictionary, is because there is literally no other way it can be done.
It doesn't process by word, but by speech item. And that's a good thing.
Consider the following.
Dictionary entry 1:
- in: house
- replacement: horse
Dictionary entry 2:
- in: brown
- replacement: red
Input text: I observe that the house is brown
In your preferred model, I think, it would find house, convert it to horse, and that would be the end of it--stop processing.
Output text: I observe that the horse is brown
The difficulty comes when you consider that there may be further processing that needs to be done to that speech item (let's think of it as a line).
You still want "brown" to become "red", but if we stopped processing, that could never happen.
So, the dictionary's patterns that remain are applied to the speech item (line) progressively, and we finally end up with:
Output text: I observe that the horse is red
But wait, you might say, the house/horse and brown/red are different word matches, why can't it just match by word instead of by the whole speech item?
To further demonstrate the reason why it has to be this way, consider dictionary entry 3.
Dictionary entry 3:
- in (regex): the ([a-z]+) is ([a-z]+)
- Replacement (regex): the portion of the \1 that I can see is \2
- Comment: The Fair witness rule
Okay, I haven't tested that regex (or any of these), but the idea is that at some point, the line, whether handled by a prior entry or not, will need to pass through that regex rule (rule 3).
If processing stopped on first match, the "Fair Witness rule" would never be encountered.
It's because the dictionary entries are each matched, individually, against the ENTIRE input text element (line).
If it stops for one, it has to stop for the entire line. (N.B. not always an actual line, might be a fragment or dialog title or whatever.)
In order for regular expressions, or even arbitrary non-regex character matches, to be possible, a text chunk needs to be tested against ALL the entries, not just the one that matches first.
If I might beat a dead house:
Dictionary rule 4.
- in: observe
- replacement: witness
If the dictionary stopped processing on any of the earlier rules, the word "observe" would never be matched, even though it wasn't referenced in any of the earlier patterns.
That, I suggest, would very much be contrary to user expectations.
If you need another example, not using regular expressions, of why it must be like this, I give you:
Dictionary entry 5:
- in: testing,
- replacement: breaking,
Dictionary entry 6:
- in: , emergency
- replacement: , fake
In order for the input text:
I am testing, emergency broadcasting
I am breaking, fake broadcasting
Both tests need to see the whole line. If either one of them got only the word "testing" or "emergency", they would fail. That's what I mean by arbitrary text strings.
The entire chunk needs to be presented to each entry.
I'm sure the intent is not specifically so you can convert a word, then convert the result, and so on, and so on, until you've created a Rube Goldbergian nightmare of a dictionary. But in order for the things we love about the dictionaries to work as they do, fall through to future rules has no choice but to be the way it works.
Now, I await someone who has actually read the dictionary code to come along and tell me why I am completely wrong; but until they do, I will assume the above to be accurate.