[MERGE] [Bug 215426] Re: MemoryError - recv for HTTP throughProxy

Sun Apr 27 09:09:20 BST 2008

>>>>> "Eric" == Eric Holmberg <eholmberg at arrow.com> writes:

Thanks for you contribution, a couple of remarks below.

    Eric> Okay, I've done the requested changes and added a unit test (sorry, had
    Eric> to use a thread to handle the server socket).  Everything tests out
    Eric> great on Windows (haven't tested it on Linux yet) and I've included
    Eric> tests (which are commented out)

Don't do that. Tests are here to be used ;)

    Eric> that reproduce the problem on my system.

See below, it's very good to write tests that reproduce bugs,
it's very bad to comment them out !

    >> 
    >> 
    >> > +        """Reads size bytes from the _file while limiting the read
    >> > +        size to 100KB.
    >> > +        """
    >> 
    >> ^^^ Shouldn't the meaning of -1 be documented?  It's a rather 
    >> unfortunate, but since it's used this way in the caller, I 
    >> can tolerate it.

That's inherited from python socket.py itself.

    >> 
    >> > +        buffer = StringIO()
    >> > +        
    >> > +        if size >= 0:        
    >> > +            # limited read - read specified size in chunks
    >> > +            while size > 0:
    >> > +                # limit size of reads as work-around for Windows XP
    >> > +                # MemoryError problems in recv.
    >> > +                # See bug: 
    >> http://bugs.launchpad.net/bzr/+bug/215426 for
    >> > +                # details
    >> > +                num_bytes_to_read = 
    >> > + min(size,self.__read_limited_buf_size)
    >> 
    >> ^^^ items in lists should always have a space after the comma.

You didn't fix that, you also have other PEP8 non compliances
like spurious white spaces at end of lines and on lines by
itself.

'bzr cdiff --check-style' will show you that (cdiff is part of
bzrtools).

    Eric> # Bazaar merge directive format 2 (Bazaar 0.90)
    Eric> # revision_id: eholmberg at arrow.com-20080425231950-vfjj8zkqi0n3363h
    Eric> # target_branch: file:///C:/temp/bzr-patch/try2/bzr.dev/
    Eric> # testament_sha1: d5741b35519116dcac1bbaca5042bf4873c8c4a1
    Eric> # timestamp: 2008-04-25 17:22:35 -0600
    Eric> # base_revision_id: pqm at pqm.ubuntu.com-20080422120059-sony5sthnlewabge
    Eric> # 
    Eric> # Begin patch
    Eric> === modified file 'bzrlib/osutils.py'
    Eric> --- bzrlib/osutils.py	2008-04-07 07:34:32 +0000
    Eric> +++ bzrlib/osutils.py	2008-04-25 23:19:50 +0000
    Eric> @@ -530,15 +530,14 @@
    Eric>      return False

    Eric> -def pumpfile(fromfile, tofile):
    Eric> -    """Copy contents of one file to another.
    Eric> +def pumpfile(fromfile, tofile, buffsize=32768):
    Eric> +    """Copy contents of one file to another using the specified buffer size.

Good catch.

    Eric> === modified file 'bzrlib/tests/test_http_response.py'
    Eric> --- bzrlib/tests/test_http_response.py	2008-01-02 14:13:55 +0000
    Eric> +++ bzrlib/tests/test_http_response.py	2008-04-25 23:19:50 +0000
    Eric> @@ -39,6 +39,8 @@

    Eric>  from cStringIO import StringIO
    Eric>  import httplib
    Eric> +import socket
    Eric> +import threading

    Eric>  from bzrlib import (
    Eric>      errors,
    Eric> @@ -753,3 +755,124 @@
    Eric>          out.read()  # Read the whole range
    Eric>          # Fail to find the boundary line
    Eric>          self.assertRaises(errors.InvalidHttpResponse, out.seek, 1, 1)
    Eric> +
    Eric> +
    Eric> +
    Eric> +class TestRangeFileSizeLargeDataBlock(tests.TestCase):
    Eric> +    """Test a RangeFile usage where the file is a TCP/IP socket using the
    Eric> +    local TCP/IP stack.
    Eric> +    
    Eric> +    This verifies that MemoryError's are not thrown for 1x and 3x the 
    Eric> +    chunk size configured in the TestRange class.  Note that test cases
    Eric> +    have also been included that reproduce the MemoryError bug 215426
    Eric> +    seen on Windows machines.    
    Eric> +    """

That's good, tests reproducing bugs are meant to be written
before fixing the code to demonstrate it, then you fix the bug
and the tests pass. There is no point in commenting them out.

    Eric> +
    Eric> +    # Port used to connect the client and server
    Eric> +    _port = 5678
    Eric> +    

Not good, you can't just choose an arbitrary port or some day, on
some configuration, the port will be used and the test will
fail. But you don't need to set up a server for theses tests
anyway (see below).

Plus,  your tests are actually failing with:

error: (98, 'Address already in use')

which demonstrate my point.

    Eric> +    # Maximum send-block size for the server
    Eric> +    _server_MTU = 1400
    Eric> +
    Eric> +    def setUp(self):
    Eric> +        super(tests.TestCase, self).setUp()
    Eric> +        
    Eric> +    def _read_limited(self, chunk_size, read_to_eof):
    Eric> +        """Test function that creates a test server to stream data to an
    Eric> +        instance of TestRange.  Data is streamed from the server in
    Eric> +        chunk_size bytes (default represents Ethernet).  If read_to_eof is
    Eric> +        True, then the size is treated as unknown and TestRange will
    Eric> +        read to end-of-file.
    Eric> +        """

Test code should be trivial because we don't want to debug
it. Here you put far too many things in it including setup and
tear down for which we use dedicated functions.

<snip/>

    Eric> +        # create a test datablock that its 10x the chunk size.
    Eric> +        test_pattern = '0123456789ABCDEF'
    Eric> +        test_data =  test_pattern * (10 * chunk_size / len(test_pattern))
    Eric> +        test_data_len = len(test_data)

You can use test.http_response.ReadSocket here and write the test
without involving a server. Something along the lines (not
tested):

s = ReadSocket(test_pattern)
data = s.read(-1)
self.assertEquals(test_pattern, data)

and since this is simple enough, you don't need to factorize it
but keep test code easy to write/read/debug.

<snip/>

    Eric> +#    def test_read_limited_100x(self):
    Eric> +#        """Uncomment this test to verify that that RangeFile FAILS for 
    Eric> +#            100x chunk sizes under Windows"""
    Eric> +#        self._read_limited(response.RangeFile._read_limited_buf_size*100,False)

If your test fails, you didn't fix the bug. If you want to verify
some failure mode look at self.assertRaises for example. The
point of test automation is to avoid manual interventions.

    Eric> +
    Eric> +    def test_read_limited_eof_1x(self):
    Eric> +        """Verifies that RangeFile works for 1x chunk sizes"""
    Eric> +        self._read_limited(response.RangeFile._read_limited_buf_size,True)
    Eric> +
    Eric> +    def test_read_limited_eof_3x(self):
    Eric> +        """Verifies that RangeFile works for 3x chunk sizes"""
    Eric> +        self._read_limited(response.RangeFile._read_limited_buf_size*3,True)
    Eric> +
    Eric> +#    def test_read_limited_eof_100x(self):
    Eric> +#        """Uncomment this test to verify that that RangeFile FAILS for 
    Eric> +#            100x chunk sizes under Windows"""
    Eric> +#        self._read_limited(response.RangeFile._read_limited_buf_size*100,True)

    Eric> === modified file 'bzrlib/transport/http/response.py'
    Eric> --- bzrlib/transport/http/response.py	2008-01-03 16:26:32 +0000
    Eric> +++ bzrlib/transport/http/response.py	2008-04-25 23:19:50 +0000
    Eric> @@ -23,11 +23,13 @@

    Eric>  import httplib
    Eric> +from StringIO import StringIO

We prefer to use:
from cStringIO import StringIO

    Eric>  from bzrlib import (
    Eric>      errors,
    Eric>      trace,
    Eric>      )
    Eric> +from bzrlib.osutils import pumpfile

We are going away from directly importing symbols, please use:

from bzrlib import (
    errors,
    osutils,
    trace,
    )

Respecting alphabetical order, and then:

osutils.pumpfile when you need it.

    Eric>  # A RangeFile expects the following grammar (simplified to outline the
    Eric> @@ -60,6 +62,10 @@
    Eric>      # instead. The underlying file is either a socket or a StringIO, so reading
    Eric>      # 8k chunks should be fine.
    Eric>      _discarded_buf_size = 8192
    Eric> +    
    Eric> +    # in _read_limited(), the size of read requests is limited to avoid 
    Eric> +    # MemoryError issues under Windows 
    Eric> +    _read_limited_buf_size = 1024 * 512

    Eric>      def __init__(self, path, infile):
    Eric>          """Constructor.
    Eric> @@ -175,6 +181,28 @@
    Eric>          self.read_boundary()
    Eric>          self.read_range_definition()

    Eric> +    def _read_limited(self, size):
    Eric> +        """Reads size bytes from the _file while limiting the read
    Eric> +        size to _read_limited_buf_size bytes.  Set size=-1 to read to the
    Eric> +        end-of-file (EOF).
    Eric> +        """
    Eric> +        buffer = StringIO()
    Eric> +        if size >= 0:        
    Eric> +            # limited read - read specified size in chunks
    Eric> +            while size > 0:
    Eric> +                # limit size of reads as work-around for Windows XP
    Eric> +                # MemoryError problems in recv.
    Eric> +                # See bug: http://bugs.launchpad.net/bzr/+bug/215426 for
    Eric> +                # details

What about:

# Limit size of reads to work around some OSes limitations
# (windows as one, see bug #215426).

It's true that we encounter such problems only on windows so far,
but that may change in the future. The intent of your patch is to
bufferize huge reads, we use it to cope with windows limitations,
but it's a nice addition in itself and should be documented as
such.

    Eric> +                num_bytes_to_read = min(size, self._read_limited_buf_size)
    Eric> +                buffer.write(self._file.read(num_bytes_to_read))
    Eric> +                size -= num_bytes_to_read
    Eric> +        else:
    Eric> +            # read to EOF in chunks
    Eric> +            pumpfile(self._file,buffer,self._read_limited_buf_size)
    Eric> +
    Eric> +        return buffer.getvalue()
    Eric> +
    Eric>      def read(self, size=-1):
    Eric>          """Read size bytes from the current position in the file.

    Eric> @@ -206,12 +234,13 @@
    Eric>              limited = self._start + self._size - self._pos
    Eric>              if size >= 0:
    Eric>                  limited = min(limited, size)
    Eric> -            data = self._file.read(limited)
    Eric> +            data = self._read_limited(limited)
    Eric>          else:
    Eric>              # Size of file unknown, the user may have specified a size or not,
    Eric>              # we delegate that to the filesocket object (-1 means read until
    Eric>              # EOF)
    Eric> -            data = self._file.read(size)
    Eric> +            data = self._read_limited(size)

The comment does not apply well anymore, this could cause
confusion in the future for people unaware of the history (what
filesocket object is this comment referring to now ?).

As you see, Aaron and the reviewer of the code you're patching (I
think it was Andrew) were unhappy with the '-1' usage (inherited
from socket.py). IMHO, others will also find it hard to
understand. This comment was added on reviewer request already,
let's try to fix it right so that further reviews don't have to
mention it again.

BB: resubmit

    Vincent