C# - Regex - Greedy vs. Non-greedy. An example.

In the example below, we create an expression to match all HTML span elements. The greedy version produces a very different – and unexpected – result.

What we want, and what we expect, is for the call to Regex.Matches to return 2 matches. One for each span in the input string.

If we use the standard syntax for matching zero or more characters (i.e. an asterisk, or Kleene star), .* we get a single match that returns everything between the first opening span and the last closing span. Clearly, this isn't what we want.

To get the correct (and expected) result, we must use non-greedy matching, by appending a question mark to the wildcard subexpression .*?.

The table below illustrates the difference:

*

Using the
Greedy quantifier

Greedy

*?

Using the
Non-greedy quantifier

Non-greedy

Input:
one <span>two</span> three <span>four</span> five
one <span>two</span> three <span>four</span> five
Pattern:

"<span>.*</span>"

"<span>.*?</span>"

Match Count:
1
2
Matches:

1

<span>two</span> three <span>four</span>

1

<span>two</span>

2

<span>four</span>

The greedy quantifier returns the longest match possible. The non-greedy quantifier returns the shortest match.


Ads by Google


Ask a question, send a comment, or report a problem - click here to contact me.

© Richard McGrath