bzr log http+urllib does not work, http+pycurl is too slow

Tue Dec 11 16:33:35 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> I have Trac running as separate python application on separate win32
> machine.
> I put pack shared repo in the Trac's htdocs directory, so my branches are
> available via http protocol (I use custom port 8000).
> Well, at least `bzr revno http://host:8000/chrome/site/branches/Logic`
> works
> as expected. But not log command.
> 
> On my client machine I use custom bzr.exe 1.0rc3 compiled without
> pycurl, so
> I encounter this bug.
> 
> bzr arguments: [u'--no-plugins', u'log',
> u'http://host:8000/chrome/site/branches/Logic']
> encoding stdout as sys.stdout encoding 'cp866'
> failed to import pycurl: No module named pycurl
> failed to instantiate transport <bzrlib.registry._LazyObjectGetter
> object at 10da5a8, module='bzrlib.transport.http._pycurl'
> attribute='PyCurlTransport'> for
> 'http://host:8000/chrome/site/branches/Logic': DependencyNotPresent()
> http readv of 930af4714be1e33318bb630b8d626540.rix  offsets => 1
...

> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 14
> collapsed 5
> Traceback (most recent call last):
>   File "bzrlib\commands.pyc", line 802, in run_bzr_catch_errors
>   File "bzrlib\commands.pyc", line 758, in run_bzr
>   File "bzrlib\commands.pyc", line 492, in run_argv_aliases
>   File "bzrlib\commands.pyc", line 768, in ignore_pipe
>   File "bzrlib\builtins.pyc", line 1750, in run
>   File "bzrlib\log.pyc", line 189, in show_log
>   File "bzrlib\log.pyc", line 303, in _show_log
>   File "bzrlib\log.pyc", line 290, in iter_revisions
>   File "bzrlib\decorators.pyc", line 127, in read_locked
>   File "bzrlib\repository.pyc", line 1003, in get_revisions
>   File "bzrlib\decorators.pyc", line 127, in read_locked
>   File "bzrlib\repository.pyc", line 1012, in _get_revisions
>   File "bzrlib\store\revision\knit.pyc", line 88, in get_revisions
>   File "bzrlib\store\revision\knit.pyc", line 104, in
> _get_serialized_revisions
>   File "bzrlib\knit.pyc", line 1062, in get_texts
>   File "bzrlib\knit.pyc", line 1068, in get_line_list
>   File "bzrlib\knit.pyc", line 1086, in _get_content_maps
>   File "bzrlib\knit.pyc", line 1051, in _get_record_map
>   File "bzrlib\knit.pyc", line 2426, in read_records_iter
>   File "bzrlib\knit.pyc", line 2033, in get_raw_records
>   File "bzrlib\pack.pyc", line 253, in iter_records
>   File "bzrlib\pack.pyc", line 294, in _read_format
>   File "bzrlib\pack.pyc", line 221, in _read_line
>   File "bzrlib\pack.pyc", line 185, in readline
>   File "bzrlib\pack.pyc", line 172, in _next
>   File "bzrlib\transport\http\__init__.pyc", line 236, in _readv
>   File "bzrlib\transport\http\__init__.pyc", line 318, in _coalesce_readv
>   File "bzrlib\transport\http\__init__.pyc", line 281, in get_and_yield
>   File "bzrlib\transport\http\_urllib.pyc", line 135, in _get
>   File "bzrlib\transport\http\_urllib.pyc", line 75, in _perform
>   File "bzrlib\transport\http\_urllib2_wrappers.pyc", line 170, in
> cleanup_pipe
>   File "bzrlib\transport\http\_urllib2_wrappers.pyc", line 137, in finish
>   File "httplib.pyc", line 529, in read
>   File "socket.pyc", line 309, in read
> error: (10055, 'No buffer space available')

Before we go too far, can you try this patch:

=== modified file 'bzrlib/trace.py'
- --- bzrlib/trace.py     2007-11-10 17:09:40 +0000
+++ bzrlib/trace.py     2007-12-11 16:26:17 +0000
@@ -129,7 +129,7 @@
     out += '\n'
     _trace_file.write(out)
     # TODO: jam 20051227 Consider flushing the trace file to help debugging
- -    #_trace_file.flush()
+    _trace_file.flush()



Judging by the error, it seems like it is failing once we try to request the
large amount of data for the actual texts (or inventory texts). Rather than
when reading the index files. And from what I've seen, Windows is really bad
about not flushing buffers when the program exits. (I don't think I've seen a
partially finished line while running Linux.)

We explicitly removed the .flush() because it measurably slowed things down for
a simple log file.

Maybe we should have a debug flag for this "-Dflushlog" or something like that.

Anyway, it would seem like the socket code is unhappy if the requested buffer
is too large.
Line 309 seems to be:
data = self._sock.recv(recv_size)

And I wonder if it always passes the full 'recv_size' that we requested which
could be several megabytes, but win32 will only accept 8k read sizes, or
something like that.

If necessary, we could probably monkey-patch the .recv() function so that it
will break a request for >8k into smaller requests. Or we could go up a level
and do so for the .read() function. socket.py does have:

            while True:
                left = size - buf_len
                recv_size = max(self._rbufsize, left)
                data = self._sock.recv(recv_size)
                if not data:
                    break
                buffers.append(data)

So maybe it would just become:

            while True:
                left = size - buf_len
                recv_size = max(self._rbufsize, left)
                recv_size = min(recv_size, self._maxbufsize)
                data = self._sock.recv(recv_size)
                if not data:
                    break
                buffers.append(data)

Either way, it seems more like a bug in Python on win32 not properly handling
when a large recv() is requested.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXrveJdeBCYSNAAMRAlsaAKDJzrveSUygQxTfhn2dZQvDf+S1LwCgymLe
EeMTvyYlQSrLAkLvt+WUWKM=
=Xkg7
-----END PGP SIGNATURE-----