[MERGE] Fix ability to use IIS as a dumb HTTP server by unquoting the boundary

Tue Jul 15 08:09:43 BST 2008

>>>>> "Adrian" == Adrian Wilkins <adrian.wilkins at gmail.com> writes:

    Adrian> Revised version of previous patch that only resorts to unquoting
    Adrian> boundaries if they fail to verify in the first place.

Great ! Thanks a lot.

Two minor tweaks though :) But only one really relevant to your
submission.

    Adrian> # Bazaar merge directive format 2 (Bazaar 0.90)
    Adrian> # revision_id: adwi2 at 014661-xp-20080714210916-7vrggx854g9t8chb
    Adrian> # target_branch: file:///D:/Documents%20and%20Settings/adwi2/bzr/

You may want to specify http://bazaar-vcs.org/bzr/bzr.dev as your
target branch when you use 'bzr send', it will allow using the
merge directive more cleanly.


<snip/>
    Adrian> -        self._file.set_boundary(boundary)
    Adrian> +        #
    Adrian> +        # Note that all parameters in real HTTPReponse instances are 
    Adrian> +        # passed through rfc822.unquote, so we should too
    Adrian> +        self._file.set_boundary(rfc822.unquote(boundary))

In the same spirit, you shouldn't modify the behavior here, but
instead emulate the bug in the daughter class only.

In the context of your submission, it doesn't make such a
difference, but someone reading this code in one year, unaware of
the bugs involved will wonder why we "should" unquote here,
especially when the boundary contains no quotes (right or buggy
ones).

The following do that:

 === modified file 'bzrlib/tests/test_http_response.py'

--- bzrlib/tests/test_http_response.py	2008-07-11 14:29:46 +0000
+++ bzrlib/tests/test_http_response.py	2008-07-15 06:58:16 +0000
@@ -266,13 +266,15 @@
     The two last ranges are contiguous. This only rarely occurs (should not in
     fact) in real uses but may lead to hard to track bugs.
     """
-    def _boundary(self):
-        return "separation"
+
+    # The following is used to both represent the boundary defined in the
+    # headers and build the boundary lines.
+    boundary = 'separation'
 
     def setUp(self):
         super(TestRangeFileMultipleRanges, self).setUp()
 
-        boundary = self._boundary()
+        boundary = self.boundary
 
         content = ''
         self.first_range_start = 25
@@ -284,22 +286,22 @@
             content += self._multipart_byterange(part, start, boundary,
                                                  file_size)
         # Final boundary
-        content += self._boundary_line(boundary)
+        content += self._boundary_line()
 
         self._file = response.RangeFile('Multiple_ranges_file',
                                         StringIO(content))
+        self.set_file_boundary()
+
+    def _boundary_line(self):
+        """Helper to build the formatted boundary line."""
+        return '--' + self.boundary + '\r\n'
+
+    def set_file_boundary(self):
         # Ranges are set by decoding the range headers, the RangeFile user is
         # supposed to call the following before using seek or read since it
         # requires knowing the *response* headers (in that case the boundary
         # which is part of the Content-Type header).
-        #
-        # Note that all parameters in real HTTPReponse instances are 
-        # passed through rfc822.unquote, so we should too
-        self._file.set_boundary(rfc822.unquote(boundary))
-
-    def _boundary_line(self, boundary):
-        """Helper to build the formatted boundary line."""
-        return '--' + boundary + '\r\n'
+        self._file.set_boundary(self.boundary)
 
     def _multipart_byterange(self, data, offset, boundary, file_size='*'):
         """Encode a part of a file as a multipart/byterange MIME type.
@@ -317,7 +319,7 @@
         :return: a string containing the data encoded as it will appear in the
             HTTP response body.
         """
-        bline = self._boundary_line(boundary)
+        bline = self._boundary_line()
         # Each range begins with a boundary line
         range = bline
         # A range is described by a set of headers, but only 'Content-Range' is
@@ -406,14 +408,22 @@
     
     This reveals a bug caused by 
     
-    - The bad implementation of RFC 822 unquoting in Python (angles are not quotes),
-    coupled with 
-    - The bad implementation of RFC 2046 in IIS (angles are not permitted chars in boundary lines).
-    
-    
+    - The bad implementation of RFC 822 unquoting in Python (angles are not
+      quotes), coupled with
+
+    - The bad implementation of RFC 2046 in IIS (angles are not permitted chars
+      in boundary lines).
     """
-    def _boundary(self):
-        return "<q1w2e3r4t5y6u7i8o9p0zaxscdvfbgnhmjklkl>" # IIS 6 and 7 use this value
+    # The boundary as it appears in the boundary lines (IIS 6 and 7 use this
+    # value)
+    boundary = '<q1w2e3r4t5y6u7i8o9p0zaxscdvfbgnhmjklkl>'
+
+    def set_file_boundary(self):
+        # Now, emulate rfc822.py buggy behaviour by removing the angles as if
+        # they were quotes. We don't use rfc822.unquote here in case it gets
+        # fixed in the future.
+        self._file.set_boundary('q1w2e3r4t5y6u7i8o9p0zaxscdvfbgnhmjklkl')
+
 
 class TestRangeFileVarious(tests.TestCase):
     """Tests RangeFile aspects not covered elsewhere."""

=== modified file 'bzrlib/transport/http/response.py'
--- bzrlib/transport/http/response.py	2008-07-14 21:09:16 +0000
+++ bzrlib/transport/http/response.py	2008-07-15 06:59:10 +0000
@@ -108,18 +108,19 @@
             # string entity.
             # To be on the safe side we allow it before any boundary line
             boundary_line = self._file.readline()
-            
+
         if boundary_line != '--' + self._boundary + '\r\n':
             # rfc822.unquote() incorrectly unquotes strings enclosed in <>
             # IIS 6 and 7 incorrectly wrap boundary strings in <>
             # together they make a beautiful bug, which we will be gracious
             # about here
-            if self._unquote_boundary(boundary_line) != '--' + self._boundary + '\r\n':
+            if (self._unquote_boundary_line(boundary_line)
+                != '--' + self._boundary + '\r\n'):
                 raise errors.InvalidHttpResponse(
                     self._path,
                     "Expected a boundary (%s) line, got '%s'" % (self._boundary,
                                                                  boundary_line))
-    def _unquote_boundary(self, b):
+    def _unquote_boundary_line(self, b):
         return b[:2] + rfc822.unquote(b[2:-2]) + b[-2:]
 
     def read_range_definition(self):


It also fixes a couple of PEP8 violations (spaces on empty lines
and lines too long).

Finally, you did mention your fix in NEWS in your previous
submission, it's not there anymore.

This fix, as you mentioned, is important for windows users, so
please get back that NEWS and get credit for it !

BB:tweak

        Vincent