[MERGE] [Bug 215426] Re: MemoryError - recv for HTTP throughProxy

Mon Apr 28 19:50:56 BST 2008

Hi Vincent,

Thanks for the response.  Response are inline with snippets of the
original email.

>     >> > +        buffer = StringIO()
>     >> > +        
>     >> > +        if size >= 0:        
>     >> > +            # limited read - read specified size in chunks
>     >> > +            while size > 0:
>     >> > +                # limit size of reads as 
> work-around for Windows XP
>     >> > +                # MemoryError problems in recv.
>     >> > +                # See bug: 
>     >> http://bugs.launchpad.net/bzr/+bug/215426 for
>     >> > +                # details
>     >> > +                num_bytes_to_read = 
>     >> > + min(size,self.__read_limited_buf_size)
>     >> 
>     >> ^^^ items in lists should always have a space after the comma.
> 
> You didn't fix that, you also have other PEP8 non compliances 
> like spurious white spaces at end of lines and on lines by itself.
> 
> 'bzr cdiff --check-style' will show you that (cdiff is part 
> of bzrtools).

I added the space in the v3 patch, so it should have been in there.

I ran cdiff and it does mention that I have spurious white spaces at the
end of lines.  I removed the ones matching the re expression '^\s+\n$',
but cdiff still mentions some.  I wonder if it has to do with the \r\n
line endings in the HTTP responses in test_http_response.py.

> That's good, tests reproducing bugs are meant to be written 
> before fixing the code to demonstrate it, then you fix the 
> bug and the tests pass. There is no point in commenting them out.

The commented-out tests will fail on machines where this bug is an
issue, but will pass for machines where it is not and issue.  So really,
running these is just for informational purposes.  This seems to lean
towards just removing them, since they won't do anything to assist
automated testing.

> 
>     Eric> +
>     Eric> +    # Port used to connect the client and server
>     Eric> +    _port = 5678
>     Eric> +    
> 
> Not good, you can't just choose an arbitrary port or some 
> day, on some configuration, the port will be used and the 
> test will fail. But you don't need to set up a server for 
> theses tests anyway (see below).

More comments below on the test -- but if we leave the TCP/IP code in
the system, is there a better way to find an available port -- I think
Idle actually tries random ports.

>     Eric> +    # Maximum send-block size for the server
>     Eric> +    _server_MTU = 1400
>     Eric> +
>     Eric> +    def setUp(self):
>     Eric> +        super(tests.TestCase, self).setUp()
>     Eric> +        
>     Eric> +    def _read_limited(self, chunk_size, read_to_eof):
>     Eric> +        """Test function that creates a test 
> server to stream data to an
>     Eric> +        instance of TestRange.  Data is streamed 
> from the server in
>     Eric> +        chunk_size bytes (default represents 
> Ethernet).  If read_to_eof is
>     Eric> +        True, then the size is treated as unknown 
> and TestRange will
>     Eric> +        read to end-of-file.
>     Eric> +        """
> 
> Test code should be trivial because we don't want to debug 
> it. Here you put far too many things in it including setup 
> and tear down for which we use dedicated functions.

I'll remove the setUp method (tearDown isn't declared) -- it's cruft
from a copy-and-paste of another test case in the file.

> 
> <snip/>
> 
>     Eric> +        # create a test datablock that its 10x the 
> chunk size.
>     Eric> +        test_pattern = '0123456789ABCDEF'
>     Eric> +        test_data =  test_pattern * (10 * 
> chunk_size / len(test_pattern))
>     Eric> +        test_data_len = len(test_data)
> 
> You can use test.http_response.ReadSocket here and write the 
> test without involving a server. Something along the lines (not
> tested):
> 
> s = ReadSocket(test_pattern)
> data = s.read(-1)
> self.assertEquals(test_pattern, data)
> 
> and since this is simple enough, you don't need to factorize 
> it but keep test code easy to write/read/debug.

I'm not familiar with test.http_response.  Could you point me to some
information on it?

Does test.http_response.ReadSocket actually send data over the real
TCP/IP stack?  This test was designed to do that to verify that the
dreaded MemoryError exception doesn't occur with 1x and 3x the block
size.

Without running the data through the TCP/IP stack, I don't see why I'm
bothering to write these tests since the pre-existing tests already
provide code-coverage of the code that I have added.

> We prefer to use:
> from cStringIO import StringIO

Changed.

> 
>     Eric>  from bzrlib import (
>     Eric>      errors,
>     Eric>      trace,
>     Eric>      )
>     Eric> +from bzrlib.osutils import pumpfile
> 
> We are going away from directly importing symbols, please use:
> 
> from bzrlib import (
>     errors,
>     osutils,
>     trace,
>     )
> 
> Respecting alphabetical order, and then:
> 
> osutils.pumpfile when you need it.

Changed.

>     Eric>  # A RangeFile expects the following grammar 
> (simplified to outline the
>     Eric> @@ -60,6 +62,10 @@
>     Eric>      # instead. The underlying file is either a 
> socket or a StringIO, so reading
>     Eric>      # 8k chunks should be fine.
>     Eric>      _discarded_buf_size = 8192
>     Eric> +    
>     Eric> +    # in _read_limited(), the size of read 
> requests is limited to avoid 
>     Eric> +    # MemoryError issues under Windows 
>     Eric> +    _read_limited_buf_size = 1024 * 512
>  
>     Eric>      def __init__(self, path, infile):
>     Eric>          """Constructor.
>     Eric> @@ -175,6 +181,28 @@
>     Eric>          self.read_boundary()
>     Eric>          self.read_range_definition()
>  
>     Eric> +    def _read_limited(self, size):
>     Eric> +        """Reads size bytes from the _file while 
> limiting the read
>     Eric> +        size to _read_limited_buf_size bytes.  Set 
> size=-1 to read to the
>     Eric> +        end-of-file (EOF).
>     Eric> +        """
>     Eric> +        buffer = StringIO()
>     Eric> +        if size >= 0:        
>     Eric> +            # limited read - read specified size in chunks
>     Eric> +            while size > 0:
>     Eric> +                # limit size of reads as 
> work-around for Windows XP
>     Eric> +                # MemoryError problems in recv.
>     Eric> +                # See bug: 
> http://bugs.launchpad.net/bzr/+bug/215426 for
>     Eric> +                # details
> 
> What about:
> 
> # Limit size of reads to work around some OSes limitations # 
> (windows as one, see bug #215426).
> 
> It's true that we encounter such problems only on windows so 
> far, but that may change in the future. The intent of your 
> patch is to bufferize huge reads, we use it to cope with 
> windows limitations, but it's a nice addition in itself and 
> should be documented as such.

Good point, changed to:

# Limit size of reads to work around OS limitations
# (Windows as one, see bug #215426) which result in
# a MemoryError exception in socket.recv.

> 
> 
>     Eric> +                num_bytes_to_read = min(size, 
> self._read_limited_buf_size)
>     Eric> +                
> buffer.write(self._file.read(num_bytes_to_read))
>     Eric> +                size -= num_bytes_to_read
>     Eric> +        else:
>     Eric> +            # read to EOF in chunks
>     Eric> +            
> pumpfile(self._file,buffer,self._read_limited_buf_size)
>     Eric> +
>     Eric> +        return buffer.getvalue()
>     Eric> +
>     Eric>      def read(self, size=-1):
>     Eric>          """Read size bytes from the current 
> position in the file.
>  
>     Eric> @@ -206,12 +234,13 @@
>     Eric>              limited = self._start + self._size - self._pos
>     Eric>              if size >= 0:
>     Eric>                  limited = min(limited, size)
>     Eric> -            data = self._file.read(limited)
>     Eric> +            data = self._read_limited(limited)
>     Eric>          else:
>     Eric>              # Size of file unknown, the user may 
> have specified a size or not,
>     Eric>              # we delegate that to the filesocket 
> object (-1 means read until
>     Eric>              # EOF)
>     Eric> -            data = self._file.read(size)
>     Eric> +            data = self._read_limited(size)
> 
> The comment does not apply well anymore, this could cause 
> confusion in the future for people unaware of the history 
> (what filesocket object is this comment referring to now ?).
> 
> As you see, Aaron and the reviewer of the code you're 
> patching (I think it was Andrew) were unhappy with the '-1' 
> usage (inherited from socket.py). IMHO, others will also find 
> it hard to understand. This comment was added on reviewer 
> request already, let's try to fix it right so that further 
> reviews don't have to mention it again.

I really don't understand what you and Aaron want.  All I did was add a
small patch to the code and now it seems like you guys want me to fix
all sorts of things in the surrounding code.  Here are the snippets of
code.  Make the changes and I'll put them into the code for you.

def read(self, size=-1):
    """Read size bytes from the current position in the file.  Use
size=-1
    to read to EOF.

    Reading across ranges is not supported. We rely on the underlying
http
    client to clean the socket if we leave bytes unread. This may occur
for
    the final boundary line of a multipart response or for any range
    request not entirely consumed by the client (due to offset
coalescing)
    """

if self._size > 0:
    # Don't read past the range definition
    limited = self._start + self._size - self._pos
    if size >= 0:
        limited = min(limited, size)
    data = self._read_limited(limited)
else:
    # Size of file unknown, the user may have specified a size or not,
    # we delegate that to the filesocket object (-1 means read until
    # EOF)
    data = self._read_limited(size)

-Eric