[MERGE] Fix ability to use IIS as a dumb HTTP server. (fixes #247585)

Adrian Wilkins adrian.wilkins at gmail.com
Mon Jul 14 12:48:32 BST 2008

Hash: SHA1

Vincent Ladeuil wrote:
> Thanks a lot for working on this, but see some comments below.
>>>>>> "Adrian" == Adrian Wilkins <adrian.wilkins at gmail.com> writes:
> <snip/>
>     Adrian> +            
>     Adrian> +        # parameters in the header all get run through rfc822.unquote
>     Adrian> +        # so therefore our boundary strings should too
> Absolutely not.
> The boundary *definition* appears in headers so it MUST be
> unquoted.
> The boundary is then *used* and in these places it MUST not be
> unquoted.

The problem in the Python lib is that the rfc822.unquote() function
considers <this> to be a quoted string, when the standard doesn't.

There is no harm in running the boundaries as parsed through the unquote
function as conformant implementations do not contain quote characters
as specified by RFC2046 : 5.1.1

> My intuition is that IIS is not buggy but that you are tricked by
> a proxy there.

This was tested straight from the IIS 7.0 install on my local machine,
with no proxy, and also from our internal IIS 6.0 install. I may have
imagined it, but I think running this through certain proxies actually
fixes the error because the proxy rewrites the headers and boundaries to
be conformant. More below.

> The bug, as you pinpointed it, is that:
> - the boundary is not quoted in the headers as it should be,
> - it is unquoted as per RFC822 because it looks like a quoted
>   string (http://rfc.net/rfc822.html#s3.3. mentions that "<" and
>   ">": "Must be in quoted-string, to use within a word."). So I
>   don't think the python module is buggy here.

RFC822 defines quoted-string as

quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                                 ;   quoted chars.

 ; which says that it's a string, enclosed in quotes. It does not
mention angle brackets as being valid quote characters.

rfc822.py:unquote() DOES treat angle brackets as quote characters, along
with a note that their implementation isn't conforming to the standard.
I think the standard lib clearly is wrong, and also knows it.

{{{ python25\lib\rfc822.py:646
# XXX Should fix unquote() and quote() to be really conformant.
# XXX The inverses of the parse functions may also be useful.

def unquote(s):
    """Remove quotes from a string."""
    if len(s) > 1:
        if s.startswith('"') and s.endswith('"'):
            return s[1:-1].replace('\\\\', '\\').replace('\\"', '"')
        if s.startswith('<') and s.endswith('>'):
            return s[1:-1]
    return s


This doesn't excuse IIS for also being wrong on this point, but if you
comment the latter part of this function, making it more compliant to
the standard, it fixes the bug.

Alas, tinkering with parts of the standard library is rather more risky
than dealing with it further downstream, and rather less practical in
terms of rapid resolution.

> - being unquoted when it shouldn't have been, it doesn't match
>   with the boundary lines.
> Unquoting the boundary lines is the wrong place to fix the bug,
> it will make bzr fail to match when the boundary definition has
> been correctly quoted in the headers.

RFC2046 is clear that angle-brackets and quotes are not in the set of
acceptable boundary characters, so it should be impossible for the
function to change the boundary string of a conforming implementation,
because they don't contain quotes or angles.

The RFC2046 multipart examples show that a quoted boundary parameter
value in the header is NOT quoted when used in boundaries. This patch
does not contradict that - the value parsed from the header will match
the boundary value because when the value is correctly quoted (with
quotes), those quotes will not be present in the boundary line.

The case I believe you are referring to is that where an HTTP server
chooses to both implement RFC822 correctly and quote values containing
angle-brackets, and simultaneously ignore RFC2046 and use angle brackets
in boundary strings, in the specific form where the boundary string is
enclosed in angles as if they were quotes.

Content-Type: multipart/byteranges; boundary="<badboundary>"

- --<badboundary>
Content-Type: application/octet-stream
Content-Range: bytes 0-41/7825065

I concede that this would pass the current code but fail after my patch,
but I think it unlikely that an HTTP server would take such care over
822 and subsequently ignore RFC2046 ... the authors would deserve to be
showered with the bug reports caused by that.

IIS alas, is a law until itself, and I rate the chances of MS patching
their bad implementation to be somewhat lower than the chances of Bazaar
compensating for an acknowledged mis-implementation of RFC822 by the
Python team.

> I'd like to better diagnose the problem before accepting a fix
> for it.
> Since it will be difficult do reverse the unquoting after the
> fact, the fix *may* be to first try the raw boundary line, and if
> it doesn't match, but only then, try with an unquoted version as
> a *workaround*.
> Can you provide some ethereal/wireshark traces and some .bzr.log
> of a command run with -Dhttp ?

There's an HTTP session trace attached to the bug in an archive.


It's just a zip archive, ignore the "saz" extension.

I'll unpatch my local install and put up a .bzr.log as well (I saw
nothing remarkable in it though).

> I just made a quick experiment with the following script:


> Host: download.microsoft.com
> Server: Microsoft-IIS/6.0
> X-Powered-By: ASP.NET
> Content-Type: multipart/byteranges; boundary=D99BE0CCF2B
> Connection: close


> The magic '<q1w2e3r4t5y6u7i8o9p0zaxscdvfbgnhmjklkl>' string you
> mentioned is not there, which makes me suspect some proxy
> tricking you there.

I suspect the trick is that MS don't use IIS to serve heavy content ;
they are known to outsource these things to Akamai cache servers. I
believe what's going on here is that the server has been configured to
pretend that it's IIS, to save Microsoft face. As stated above, these
results were replicated on two instances of IIS 6.0 and 7.0, with no
intervening proxy.

The tests I did reveal that IIS uses this string regardless of the file
being served. This is another place where IIS does not implement the RFC
correctly - RFC2046 : 5.1 states that the boundary string MUST NOT
appear in the body. Even if you pack a file with instances of the magic
string, IIS still uses it for boundaries.

Incidentally, IIS 5.1 does NOT cause this error. It's hard-coded magic
boundary string is

.. which is also non-compliant with RFC2046. If you google this value,
you can find confirmation in a paper which discusses remote
identification of webservers.

It's not surprising that IIS 6 and 7 also do this ; in fact, while the
standard is clear, it's obviously impractical on the modern web, because
being compliant would require you to pre-parse every stream you served
before you serve the headers. I'd be surprised to find any real-world
implementation to be compliant. I'd even go so far as to say that it's a
fundamentally broken piece of the standard because it precludes serving
content both promptly and correctly at the same time ; but that's a
digression best left for IETF meetings.

So ; in short, the patch cannot break anything that isn't already
broken, and compensates for a combination of problems in both the
standard Python library and the second-most-popular webserver in the world.

The most correct outcome would be that the Python and IIS teams fix
their stuff. But you and I know that the likelihood of both happening
any time soon is low.

The inability to work from IIS as a server can only hurt adoption of
Bazaar, as one of the first tests in any MS-wedded corporate environment
is going to be to try it out as a dumb server ; a low-risk stratagem.
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the bazaar mailing list