Home Dashboard Directory Help
Search

System.Uri incorrectly strips trailing dots by Eamon Nerbonne


Status: 

Resolved
 as Fixed Help for as Fixed


19
0
Sign in
to vote
Type: Bug
ID: 386695
Opened: 12/5/2008 4:38:13 AM
Access Restriction: Public
1
Workaround(s)
view
17
User(s) can reproduce this bug

Description

System.Uri strips trailing '.' characters even though the RFC specifies otherwise. For example, the url http://microsoft.com/dir/test.../file.txt would be incorrectly canonicalized by System.Uri as http://microsoft.com/dir/test/file.txt

As a code snippet, the following should return false, but instead returns true:
new Uri("http://microsoft.com/dir/test.../file.txt") == new Uri("http://microsoft.com/dir/test/file.txt")

In other words, any System.Uri based class including WebRequest and WebClient will accept a valid URL such as "http://microsoft.com/dir/test.../file.txt" but retrieve "http://microsoft.com/dir/test/file.txt" without any warning.

In practice, real webservices such as the amazon services and last.fm amonst others actually make meaningful distinctions between URL's that contain trailing dots in path segments. Using System.Uri anywhere when querying these web services will introduce subtle data corruptions bugs. Amazon's own sample S3 library for C# triggers this bug and will request the wrong object when given a key with trailing dots.

The System.Uri implementation strips trailing dots from path segments, whereas the RFC specifies that absolute paths may contain dots anywhere in the path, and relative paths may contain trailing dots, though path segments of relative uri's consisting solely of ".." or "." are treated specially. Reference: http://www.ietf.org/rfc/rfc2396.txt

This behavior seems not to follow the RFC. Of note are RFC sections 5.2, and appendices A and C.

Appendix A contains the grammar for Uri's, the relevant subset being:

URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
     absoluteURI = scheme ":" ( hier_part | opaque_part )
     hier_part     = ( net_path | abs_path ) [ "?" query ]
     net_path     = "//" authority [ abs_path ]
     abs_path     = "/" path_segments
     path_segments = segment *( "/" segment )
     segment     = *pchar *( ";" param )
     pchar         = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
     unreserved    = alphanum | mark
     mark         = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

Section 5.2 is titled "Resolving Relative References to Absolute Form". It provides a step by step resolution procedure. Step 6 performs the actual resolution of ".." and "." items. Of note are steps 4 and 5, however, which explain that if a Uri either contains an authority component (microsoft.com in the example http://microsoft.com/dir/test.../file.txt) or has a path starting with a '/' then one bypasses step 6 entirely. Thus, in absolute URLs, both with and without a domain name, it is an error to resolve relative references at all.

Appendix C gives examples of the relative resolution process including examples which contain leading and trailing dots in relative paths. In particular, even in a relative path both leading and trailing dots are permitted; the example base URL of http://a/b/c/d;p?q and the relative path of g.. are combined to http://a/b/c/g.. in the RFC.

Is this behavior intentional? The deviation from the spec is not documented. This behavior makes it impossible to query some webservices that include trailing dots in path segments using System.Uri. Finally, is there a workaround to achieve compliant behavior? The code I'm writing runs in full-trust, so if necessary reflection could be used to modify System.Uri internals, but perhaps an easier solution is possible (and I don't know what the internals should be set to, either). I'd certainly prefer a hack than needing to fall back to plain TCP/IP or proxying.

Details
Sign in to post a comment.
Posted by Keith Beckman on 8/16/2013 at 8:40 PM
Reopened, as they're obviously not following comments on "Resolved Fixed" issues, here:

https://connect.microsoft.com/VisualStudio/feedback/details/797649/system-uri-incorrectly-canonicalizes-all-path-components-as-windows-file-path-components
Posted by Keith Fletcher on 7/7/2013 at 5:43 PM
Any update on this fix?
Posted by gblmarquez on 4/28/2012 at 6:45 PM
+1 programmer suffering from this issue
Posted by Keith Beckman on 1/3/2012 at 11:56 AM
This is not a minor issue. A deviation from the spec of this magnitude shouldn't be kept on board "just in case" someone was depending on "foo./bar" automatically folding to "foo/bar".

My company works with a large number of client URLs, none of which we have the slightest control over. As many of these URLs are derived from personal names, trailing periods in path components are not at all unusual. Bringing this basic functionality into compliance with the specs needs to be a priority.

As it is, the only possible workaround for this requires full trust -- and I need not explain what a bad idea running public-facing web sites in full trust can be. Unfortunately, until we can port our code to an RFC-compliant URI parsing library, we're forced to do exactly that.

For users of WebRequest and WebClient (among others), even this would not be an option, as overriding these classes' internal use of native URI parsing isn't possible.

In the case of environments in which full trust is not an option, there is no workaround.

When compatibility with code depending on unquestionably buggy behaviour is weighed against the concern for interoperability with the greater internet, interoperability must win out. A flag property .TreatPathsAsWindowsFilePaths, for backward compatibility could be implemented to reinstate the previous behaviour, but defaulting to noncompliant, buggy operation should not be an option.
Posted by Dan Fitch on 10/26/2011 at 2:19 PM
Pinging this again; as Eamon Nerbonne points out, the status is "fixed" despite no released fix.

Let us know when the fix is planned, please!
Posted by Eamon Nerbonne on 6/12/2011 at 12:18 AM
The status has beeen changed to fixed - does this mean in a future .NET version Uri will no longer strip trailing dots? Do you have an idea when the fix will become available (i.e. in .NET 5 or in a patch to 4)?
Posted by Microsoft on 3/26/2010 at 12:38 PM
Thanks for the feedback. Yes, this would include web.config as well.

We maintain a high compatibility bar between releases, and a change of this nature would not allow us to preserve that. We also have a number of related changes we would like to introduce at the same time, and so versioning the parser currently looks like the cleanest approach. As you can imagine, that has a number of implications which we're still analyzing.
Posted by Eamon Nerbonne on 3/26/2010 at 12:21 AM
An app.config option would be fine (am I correct that that implies a web.config option for ASP.NET?). Obviously, it's not ideal; those kind of settings aren't particularly discoverable and they clutter up the config file. Also, making it an option means most people would still be using the old buggy parsings; that's unfortunate for two reasons:

(1) They probably won't notice; leading to subtle errors that go undetected or are hard to fix because it's not the most likely place to look for an error.

(2) The optional code will be less well tested.

So, for me it's fine, but it does make things confusing. Documenting the option on the MSDN Uri page and/or adding the option to the default project templates in VS.NET would help - I'm guessing you're leery of making the fix because of unintended consequences?
Posted by Microsoft on 3/25/2010 at 5:45 PM
Unfortunately, circumstances prevented us from correcting this in .NET 4. We understand this is an important issue and are currently evaluating our options to resolve this.

It would be helpful to understand whether if we required an explicit opt-in to new parsing behavior through the app.config if this would be an acceptable fix or if this would be challenging to adopt for your scenarios.
Posted by Jon Davis on 2/17/2010 at 3:35 PM
Confirmed on my end as well, this a serious bug in System.Uri that still exists in .NET Framework 4.0 RC.
Posted by Eamon Nerbonne on 2/16/2010 at 8:17 AM
I still see this behaviour in .NET 4.0 RC also.
Posted by Bert Huijben-TCG on 12/22/2009 at 3:55 AM
I can still reproduce this issue in Visual Studio 2010 Beta 2. What is the status?
Posted by Microsoft on 6/4/2009 at 11:17 AM
We will be addressing this issue in the next release of the Framework (.NET 4.0).

Thank you,

Network Class Library Team
Posted by Microsoft on 4/2/2009 at 10:00 AM
We are evaluating the timeframe for fixing this issue.
Posted by Eamon Nerbonne on 3/6/2009 at 8:18 AM
Is there any update on this issue - any place I can tell whether and when this will be fixed? It would be helpful if the actual parsing behavior were documented on MSDN, regardless of the status of any real fix.
Posted by Microsoft on 12/8/2008 at 12:07 AM
Thanks for your feedback. We are escalating this bug to the product unit who works on that specific feature area. The team will review this issue and make a decision on whether they will fix it or not for the next release.

Thank you,
Visual Studio Product Team
Sign in to post a workaround.
Posted by Filip Navara1 on 10/27/2009 at 8:34 AM
MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
if (getSyntax != null && flagsField != null)
{
    foreach (string scheme in new[] { "http", "https" })
    {
        UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { scheme });
        if (parser != null)
        {
            int flagsValue = (int)flagsField.GetValue(parser);
            // Clear the CanonicalizeAsFilePath attribute
            if ((flagsValue & 0x1000000) != 0)
                flagsField.SetValue(parser, flagsValue & ~0x1000000);
        }
    }
}