bzr log http+urllib does not work, http+pycurl is too slow

John Arbash Meinel john at arbash-meinel.com
Tue Dec 11 16:44:38 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...

> With pycurl installed I have another error:
> 
> bzr arguments: [u'--no-plugins', u'log',
> u'http://host:8000/chrome/site/branches/Logic']
> encoding stdout as sys.stdout encoding 'cp866'
> got pycurl error: 6, Could not resolve host: host (Domain name not
> found), (6, 'Could not resolve host: host (Domain name not found)'),
> url: http://host:8000/chrome/site/branches/Logic/.bzr/branch-format


I assume the machine name is "acad"

...

> resolve host: acad (Domain name not found))
> on http://host:8000/chrome/site/branches/Logic/.bzr/branch-format
> 
> return code 3
> 

This is just a simple DNS lookup issue. I'm not sure what pycurl uses, but
obviously something which is different from the rest of the system. Are you
explicitly listing this host in lm_hosts or something like that?

I know I've seen DNS issues on Windows before. Where you set up a DNS server,
and it is used by some apps, but not by all. Always very mysterious and
confusing to me.


> If I use real IP in the URL I finally get log but it's very-very slow
> (comparing to smart server or direct access to branch via windows UNC
> network). But using real IP does not help when I use http+urllib URL.
> 
> May be slowness caused by Trac sever, I'm not sure. Remote machine is
> old and slow (CPU 750MHz). But fast enough when I use bzr:// protocol.
> In absolute numbers:
> 
> bzr revno http://.... -> 2.7 seconds
> bzr revno bzr://....  -> 2.4 seconds
> bzr revno file://...  -> 0.75 seconds
> 
> bzr log http://... -> 187.766 seconds
> bzr log bzr://.... -> 8.2 seconds
> bzr log file://... -> 7.6 seconds

This is a bit surprising. Especially since bzr:// is working well. This isn't
an https server, but I wonder if the problem is the connection handshaking.

Actually, you know what.... It is probably refusing the readv() requests. So we
have to fall back to get() of the whole file. Which means a whole lot of
downloading.

One way to diagnose this is to see what your network bandwidth is during this
time. If it is low, then pycurl is being bad about how it is processing the
requests. If it is high, then it is probably your server refusing to give back
partial files.

I know the built-in python HTTP server doesn't support partial files. Do we
know what server Trac uses?

Also, when pycurl succeeds, you should look in ~/.bzr.log for a line like:

        if self._range_hint == 'multi':
            self._range_hint = 'single'
            mutter('Retry "%s" with single range request' % relpath)
        elif self._range_hint == 'single':
            self._range_hint = None
            mutter('Retry "%s" without ranges' % relpath)
        else:

So if you see "Retry ... with single range request" or "Retry without ranges",
then we know that each time we go back for a bit more data, it is downloading
too much. (Especially if you get 'without ranges').

Packs in general are going to perform very poorly in this situation. They were
designed with the idea of being able to do partial reads. Which makes them much
faster when you don't have to parse the complete set of all indexes. But much
slower when requesting the bytes from 10-20 gives you the bytes from 0-100000.

We might be able to expose this a bit more at the readv layer. Robert did some
work to allow for readv() to return more data than was explicitly requested.
Pack indexes will buffer whatever is returned by readv().

So if we change the HTTP implementation, so that it makes sure to return all
data it actually had to read, rather than just a slightly larger set of ranges
than requested.

At the moment, the HTTP._readv() doesn't see this. As it is hidden in
Transport.readv() with:
if adjust_for_latency:
  offsets = self._sort_expand_and_combine(offsets, upper_limit)
return self._readv(relpath, offsets)

But we could override HTTP.readv() to know when self._range_hint == 'single' or
self._range_hint = None (download the whole file).


> 
> 
> Here the relevant part of .bzr.log corresponding to executing bzr log
> http://... command:
> 
> bzr arguments: [u'--no-plugins', u'log',
> u'http+pycurl://X.X.X.X:8000/chrome/site/branches/Logic']
> encoding stdout as sys.stdout encoding 'cp866'
> http readv of 930af4714be1e33318bb630b8d626540.rix  offsets => 1
> collapsed 1
> http readv of 930af4714be1e33318bb630b8d626540.rix  offsets => 1
> collapsed 1
> http readv of 930af4714be1e33318bb630b8d626540.rix  offsets => 1
> collapsed 1
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 10
> collapsed 3
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 14
> collapsed 5
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 20
> collapsed 2
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 29
> collapsed 3
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 43
> collapsed 2
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 64
> collapsed 3
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 95
> collapsed 2
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 142
> collapsed 2
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 201
> collapsed 3
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 201
> collapsed 5
> http readv of 930af4714be1e33318bb630b8d626540.pack  offsets => 134
> collapsed 3
> return code 0
> 
> The size of 930af4714be1e33318bb630b8d626540.rix is 177KB,
> 930af4714be1e33318bb630b8d626540.pack -- 112 MB.
> Shared repository was repacked with `bzr pack` commad.
> 
> Do you think I need to run some more tests in my configuration?

Nothing terribly suspicous so far. Bt it might be good to turn on timestamps of
this, so we can see what is being so slow.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXr52JdeBCYSNAAMRAj3ZAKCD8AJCCZL5GNXNWL80gqg7O4aSvwCfd4P/
OQ8rEiglFemgZTxRn4ZEOok=
=Ozft
-----END PGP SIGNATURE-----



More information about the bazaar mailing list