Heard about the firefoxurl vulnerability?
It turns out that you can exploit Firefox by having Internet Explorer visit a link to a URL that starts with "firefoxurl:" (and a bunch of other code). [Assuming you have Firefox on your computer along with Internet Explorer]
This is because Internet Explorer blindly accepts and passes the entire contents of the URL to the handler for the firefoxurl URL type - that handler, as the URL scheme name implies, is Firefox. It's also because Firefox can be exploited by command-line parameters, because Firefox's protocol is handled by interpreting a command-line, and because Firefox interprets the command-line provided to it as if it is always well-formed.
There's been a lot of discussion about whose problem this is, and where it needs fixing. Jesper's a friend of mine, and I'm a fan of his, so I'd like to point to his posts on the discussion so far, here and here.
A number of people have made references to RFC 1738, and its description of which characters must, and may, be encoded in a URL. That's all very interesting , if you're engaged in academic discussion of how to create a URL, as the originator, or how to process it, as a consumer.
In this case, the discussion as to whether IE has a flaw should be centered on how much work an intermediary party should do when given something that is alleged to be a URL, before it hands it off to a third party for actual handling.
This makes this intermediary (Internet Explorer in the original exploit, but ... are there others?) behave like a proxy for such protocol handlers, rather than a consumer or provider of the URL as a whole.
I'm sure we'd have heard a different tale if Microsoft's Internet Explorer team had chosen to limit the set of characters that can be passed through to an underlying handler; instead we'd hear "why does my protocol handler have to interpret encoded character sequences? They weren't encoded in the link, and there's no reason for IE to encode them!"
As Markellos Diorinos, IE Product Manager, points out in the IEBlog, it's not just the presence of uncomfortable quote characters that the protocol handler will have to cope with, it's buffer overflows, invalid representations, and out-of-spec protocol portions of varying kinds. IE can't possibly know all the things that your application might find uncomfortable, versus all the things that your protocol may need, so it doesn't try to guess, or limit the possible behaviours of the protocol handler.
In short, IE does what any interface between transport layers does - it strips off the header ("firefoxurl:"), and passes the rest uninterpreted to the next layer. It is IE's job, in this case, only to identify (from the scheme specifier) which protocol handler to fire up, and to pass its parameters to it.
Perhaps you think that's not defence in depth - but then, defence in depth is not about enforcing the same defence at several layers, it's about using knowledge specific to each layer to protect against attacks within each layer. Sometimes those protections are redundant, but unless there is different knowledge in that redundancy allowing the layers to do different defence work, there is little value to redundancy for redundancy's sake.
Yes, the IE team could have decided that they'd enforce URL standards that were not being followed by the upstream provider (in this case, the creator of the link), and enforce them on the portion passed to the downstream, but such approaches tend to limit the flexibility of the protocol.
IE's responsibility is to ensure that any URL that comes to it does not trigger a vulnerability in IE, that any URL that comes from it conforms to RFCs, and that any information that is supposed to pass unmolested through it actually passes unmolested.
It's just a matter of some amusement that when Mozilla's Window Snyder, Chief Security Something-or-other, called out this lack of extra preprocessing as a specific vulnerability in Internet Explorer, she did not think to confirm first that Firefox itself did not contain the same behaviour. I will be interested to see how they address this - whether they will 'fix' the behaviour, and if they do, what will be the resulting impact on compatibility with existing protocol handlers whose programmers assumed that their data would arrive unmolested, as documented, and who have already taken appropriate security measures to cope with this (such as not parsing anything past the beginning of the user data as if it was anything other than untrustworthy user data).
Finally, as a nod to my own past as a nit-picker of RFCs, here's what RFC 3986, which obsoletes the generic URL specification portions of RFC 1738, has to say about intermediaries in the URI handling stream:
The URI syntax is organized hierarchically, with components listed in
order of decreasing significance from left to right. For some URI
schemes, the visible hierarchy is limited to the scheme itself:
everything after the scheme component delimiter (":") is considered
opaque to URI processing. Other URI schemes make the hierarchy
explicit and visible to generic parsing algorithms.
That suggests that a generic URI processor (such as a forwarding proxy) should see the URI after the scheme component as "opaque to URI processing" - in other words, that the processor should assume it can understand nothing about, and therefore should not inspect, the part after the colon.
Further down in the document:
Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. Once produced, a URI is always in its percent-encoded form.
When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters.
...Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string.
Clearly, if Internet Explorer (or any other web browser that supports this kind of protocol pass-through technique) were to encode characters that are not supposed to be in a URL, it would fall afoul of this definition in the usual case, by encoding "the same string more than once", once at preparation by a conformant URI provider, and once again as it passed through IE.
IE's best bet for compatibility and future extensibility (as well as compliance with current RFCs) is to not inspect or modify the scheme-specific component of any URI unless it is handling that URI itself.