In the example below, we create an expression to match all HTML span elements. The greedy version produces a very different – and unexpected – result.
What we want, and what we expect, is for the call to Regex.Matches to return 2 matches. One for each span in the input string.
If we use the standard syntax for matching zero or more characters (i.e. an asterisk, or Kleene star), .*
we get a single match that returns everything between the first opening span and the last closing span. Clearly, this isn't what we want.
To get the correct (and expected) result, we must use non-greedy matching, by appending a question mark to the wildcard subexpression .*?
.
The table below illustrates the difference:
*
Using the
Greedy quantifier
Greedy
*?
Using the
Non-greedy quantifier
Non-greedy
one <span>two</span> three <span>four</span> five
one <span>two</span> three <span>four</span> five
"<span>.*</span>"
"<span>.*?</span>"
1
2
1
<span> two</span> three <span>four </span>
1
<span> two </span>
2
<span> four </span>
The greedy quantifier returns the longest match possible. The non-greedy quantifier returns the shortest match.