Search

Complex regex evaluation hangs by kzu

Closed
as By Design Help for as By Design

15
Sign in to vote
0
Sign in to vote
Sign in
to vote
Type: Bug
ID: 128105
Opened: 6/9/2006 8:18:35 AM
Access Restriction: Public
0
Workaround(s)
6
User(s) can reproduce this bug
Certain complex regular expressions, like the ones typically used to perform mini-DSLs parsing, hangs completely when evaluating against relatively small strings.
Details (expand)
Product Language
English
Version
.NET Framework 2.0
Operating System
Windows XP Professional
Operating System Language
English
Category
\SDK
Subcategory 1
\.NET Framework SDK tools
Subcategory 2
 
Subcategory 3
 
Steps to Reproduce
Run this console app:

class Program
{
static readonly Regex ReferenceExpression = new Regex(@"
# Matches invalid empty brackets #
(?<empty>\$\(\))|
# Matches a valid argument reference with potencial method calls and indexer accesses #
(?<reference>\$\(([^\(]+([\(\[][^\)\]]*[\)\]])?)+\))|
# Matches opened brackes that are not properly closed #
(?<opened>\$\([^\)\$\(]*(?!\)))",
RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

static void Main(string[] args)
{
string hangString = @"DisconnectedAgents\$(CurrentItem.Name)\$(ProxyType.Name)AgentCallback.cs\$(ProxyType.Name)AgentCallbackBase.cs";

Console.WriteLine(ReferenceExpression.IsMatch(hangString));
}
}
Actual Results
Hangs and never returns the boolean from IsMatch to the console output.
Expected Results
Should return "True" in this case, but should definitely not hang.
File Attachments
0 attachments
Sign in to post a comment.
Posted by Microsoft on 6/28/2006 at 12:31 PM
Hello kzu,

This is a known performance issue with System.Text.RegularExpressions when there are nested quantifiers (+ and *) in the expression. The nested quantifiers cause nondeterministic finite automaton (NFA) engines to have exponential running time.

* Please refer to the BCL blog entry for more details - "Regex hangs with my expression [David Gutierrez]": http://blogs.msdn.com/bclteam/archive/2005/02/10/370690.aspx

WORKAROUND
--------------------------
The workaround is to re-write the expression to no longer have nested quantifiers, reduce pattern ambiguities, and to limit backtracking where possible.

EXTRA INFO
---------------------
* Please refer to the MSDN article for more details on NFA versus DFA engines - "Matching Behavior": http://msdn2.microsoft.com/en-us/library/0yzc2yb0.aspx

Thanks,
Josh Free
Base Class Libraries Development
Posted by kzu on 6/28/2006 at 2:38 PM
Ok, I see. It's unfortunate, but I guess there's no way around it, other than refactoring the expression.
Thanks for the explanation, and feel free to close the bug as By Design.