System.Uri strips trailing '.' characters even though the RFC specifies otherwise. For example, the url http://microsoft.com/dir/test.../file.txt would be incorrectly canonicalized by System.Uri as http://microsoft.com/dir/test/file.txt
As a code snippet, the following should return false, but instead returns true:
new Uri("http://microsoft.com/dir/test.../file.txt") == new Uri("http://microsoft.com/dir/test/file.txt")
In other words, any System.Uri based class including WebRequest and WebClient will accept a valid URL such as "http://microsoft.com/dir/test.../file.txt" but retrieve "http://microsoft.com/dir/test/file.txt" without any warning.
In practice, real webservices such as the amazon services and last.fm amonst others actually make meaningful distinctions between URL's that contain trailing dots in path segments. Using System.Uri anywhere when querying these web services will introduce subtle data corruptions bugs. Amazon's own sample S3 library for C# triggers this bug and will request the wrong object when given a key with trailing dots.
The System.Uri implementation strips trailing dots from path segments, whereas the RFC specifies that absolute paths may contain dots anywhere in the path, and relative paths may contain trailing dots, though path segments of relative uri's consisting solely of ".." or "." are treated specially. Reference: http://www.ietf.org/rfc/rfc2396.txt
This behavior seems not to follow the RFC. Of note are RFC sections 5.2, and appendices A and C.
Appendix A contains the grammar for Uri's, the relevant subset being:
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
absoluteURI = scheme ":" ( hier_part | opaque_part )
hier_part = ( net_path | abs_path ) [ "?" query ]
net_path = "//" authority [ abs_path ]
abs_path = "/" path_segments
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
Section 5.2 is titled "Resolving Relative References to Absolute Form". It provides a step by step resolution procedure. Step 6 performs the actual resolution of ".." and "." items. Of note are steps 4 and 5, however, which explain that if a Uri either contains an authority component (microsoft.com in the example http://microsoft.com/dir/test.../file.txt) or has a path starting with a '/' then one bypasses step 6 entirely. Thus, in absolute URLs, both with and without a domain name, it is an error to resolve relative references at all.
Appendix C gives examples of the relative resolution process including examples which contain leading and trailing dots in relative paths. In particular, even in a relative path both leading and trailing dots are permitted; the example base URL of http://a/b/c/d;p?q and the relative path of g.. are combined to http://a/b/c/g.. in the RFC.
Is this behavior intentional? The deviation from the spec is not documented. This behavior makes it impossible to query some webservices that include trailing dots in path segments using System.Uri. Finally, is there a workaround to achieve compliant behavior? The code I'm writing runs in full-trust, so if necessary reflection could be used to modify System.Uri internals, but perhaps an easier solution is possible (and I don't know what the internals should be set to, either). I'd certainly prefer a hack than needing to fall back to plain TCP/IP or proxying.