[MERGE] [Bug 215426] Re: MemoryError - recv for HTTP throughProxy

Tue Apr 29 18:52:07 BST 2008

>>>>> "Eric" == Eric Holmberg <eholmberg at arrow.com> writes:

    Eric> Hi Vincent,
    Eric> Thanks for the response.  Response are inline with snippets of the
    Eric> original email.

    >> >> > +        buffer = StringIO()
    >> >> > +        
    >> >> > +        if size >= 0:        
    >> >> > +            # limited read - read specified size in chunks
    >> >> > +            while size > 0:
    >> >> > +                # limit size of reads as 
    >> work-around for Windows XP
    >> >> > +                # MemoryError problems in recv.
    >> >> > +                # See bug: 
    >> >> http://bugs.launchpad.net/bzr/+bug/215426 for
    >> >> > +                # details
    >> >> > +                num_bytes_to_read = 
    >> >> > + min(size,self.__read_limited_buf_size)
    >> >> 
    >> >> ^^^ items in lists should always have a space after the comma.
    >> 
    >> You didn't fix that, you also have other PEP8 non compliances 
    >> like spurious white spaces at end of lines and on lines by itself.
    >> 
    >> 'bzr cdiff --check-style' will show you that (cdiff is part 
    >> of bzrtools).

    Eric> I added the space in the v3 patch, so it should have been in there.

    Eric> I ran cdiff and it does mention that I have spurious
    Eric> white spaces at the end of lines.  I removed the ones
    Eric> matching the re expression '^\s+\n$', but cdiff still
    Eric> mentions some.

Try '\s+\n$' next time, you also had some trailing spaces on some
non-empty lines.

    Eric> I wonder if it has to do with the \r\n line endings in
    Eric> the HTTP responses in test_http_response.py.

No.

    >> That's good, tests reproducing bugs are meant to be written 
    >> before fixing the code to demonstrate it, then you fix the 
    >> bug and the tests pass. There is no point in commenting them out.

    Eric> The commented-out tests will fail on machines where
    Eric> this bug is an issue, but will pass for machines where
    Eric> it is not and issue.  So really, running these is just
    Eric> for informational purposes.  This seems to lean towards
    Eric> just removing them, since they won't do anything to
    Eric> assist automated testing.

Indeed, if we can't reliably reproduce the problem, we can't test
it automatically.

That's an additional reason to not involve the TCP stack at all
in the tests, but instead, test that your modification
have the intended behavior.

An approach will be to define a test socket (based on ReadSocket
for example) that will simulate the bug (i.e. raise MemoryError)
and use it against your code.

    >> 
    Eric> +
    Eric> +    # Port used to connect the client and server
    Eric> +    _port = 5678
    Eric> +    
    >> 
    >> Not good, you can't just choose an arbitrary port or some 
    >> day, on some configuration, the port will be used and the 
    >> test will fail. But you don't need to set up a server for 
    >> theses tests anyway (see below).

    Eric> More comments below on the test -- but if we leave the
    Eric> TCP/IP code in the system, is there a better way to
    Eric> find an available port -- I think Idle actually tries
    Eric> random ports.

As Andrew mentioned we usually listen on port 0, see test_http
for example.

    Eric> I'll remove the setUp method (tearDown isn't declared)
    Eric> -- it's cruft from a copy-and-paste of another test
    Eric> case in the file.

I was suggesting the *opposite* simplify the test method by
putting server setup into the setUp method and server shutdown
to the tearDown method.

Otherwise if something goes wrong in your test, the server will
not be shutdown and the test will leak sockets.

The tearDown method is called whether or not the test succeeds,
that's one point, the other is to make the test method easier to
read.

<snip/>

    >> You can use test.http_response.ReadSocket here and write the 
    >> test without involving a server. Something along the lines (not
    >> tested):
    >> 
    >> s = ReadSocket(test_pattern)
    >> data = s.read(-1)
    >> self.assertEquals(test_pattern, data)
    >> 
    >> and since this is simple enough, you don't need to factorize 
    >> it but keep test code easy to write/read/debug.

    Eric> I'm not familiar with test.http_response.  Could you point me to some
    Eric> information on it?

Roughly, test_http_response is for testing the http response
parsing mostly handled by the
bzrlib.transport.http.response.RangeFile object. All the tests
exercise the object without involving a real http server by using
a ReadSocket which offers the needed API making it look like a
socket.

Tests involving an http_server are in test_http.

    Eric> Does test.http_response.ReadSocket actually send data
    Eric> over the real TCP/IP stack? 

No.

    Eric> This test was designed to do that to verify that the
    Eric> dreaded MemoryError exception doesn't occur with 1x and
    Eric> 3x the block size.

    Eric> Without running the data through the TCP/IP stack, I
    Eric> don't see why I'm bothering to write these tests since
    Eric> the pre-existing tests already provide code-coverage of
    Eric> the code that I have added.

No. There is no tests that exercises the bug your patch is
fixing, otherwise there would have been no bug.

So, we want a test that failed without your fix and succeeds with
your fix. 

That's how we demonstrate that the bug is fixed.

<snip/>

    Eric> Changed.

<snip/>

    Eric> Changed.

Thanks.

<snip/>

    Eric> Good point, changed to:

    Eric> # Limit size of reads to work around OS limitations
    Eric> # (Windows as one, see bug #215426) which result in
    Eric> # a MemoryError exception in socket.recv.

Good.

<snip/>

    >> 
    >> The comment does not apply well anymore, this could cause 
    >> confusion in the future for people unaware of the history 
    >> (what filesocket object is this comment referring to now ?).
    >> 
    >> As you see, Aaron and the reviewer of the code you're 
    >> patching (I think it was Andrew) were unhappy with the '-1' 
    >> usage (inherited from socket.py). IMHO, others will also find 
    >> it hard to understand. This comment was added on reviewer 
    >> request already, let's try to fix it right so that further 
    >> reviews don't have to mention it again.

    Eric> I really don't understand what you and Aaron want.  

Aaron feel the '-1' is not intuitive, *I* want to make Aaron
happy and help future readers.

    Eric> All I did was add a small patch to the code and now it
    Eric> seems like you guys want me to fix all sorts of things
    Eric> in the surrounding code.  

:-)

That's not the intent. But, code modifications always go through
review. So at least two people (often three, sometimes more) read
the patches and some surrounding lines.

The review process often provoke additional modifications because
the reviewer realizes that the original code *or* the proposed
code may be enhanced.

In that case the patch author inherits some more work :) But he
is also the one with the fresher knowledge...

But here, I just asked you to modify the comment:

  # Size of file unknown, the user may have specified a size or not,
  # we delegate that to the filesocket object (-1 means read until
  # EOF)
  data = self._file.read(size)

Before your patch, it was clear that the comment was applying to
self._file, which any reader of RangeFile knows is a socket-like
object.

Now with your patch:

  # Size of file unknown, the user may have specified a size or not,
  # we delegate that to the filesocket object (-1 means read until
  # EOF)
  data = self._read_limited(size)

Huh ? We delegate what ? To who ? -1 ? Why ?

So I wanted you to keep the comments coherent by separating:

    # Size of file unknown

Which is related to read() structural flow, from

  # the user may have specified a size or not,
  # we delegate that to the filesocket object (-1 means read until
  # EOF)

which is related to the filesocket object.

But indeed _read_limited mentions that in the doc string using an
usual (for me, but I'm not native)

 'size=-1 to read to the end-of-file (EOF).'

instead of (-1 means read until EOF)

So that should be good.

    Eric> Here are the snippets of code.  Make the changes and
    Eric> I'll put them into the code for you.

Hehe, usually we proceed the other way, since, in the end, the
final reviewer merge your contribution to the trunk :)

Hope this helps,

     Vincent

P.S.: Apologies to Andrew for the misattribution about -1 :-)