bzr log http+urllib does not work, http+pycurl is too slow

Wed Dec 12 14:13:42 GMT 2007

John Arbash Meinel пишет:
>> If I use real IP in the URL I finally get log but it's very-very slow
>> (comparing to smart server or direct access to branch via windows UNC
>> network). But using real IP does not help when I use http+urllib URL.
>>
>> May be slowness caused by Trac sever, I'm not sure. Remote machine is
>> old and slow (CPU 750MHz). But fast enough when I use bzr:// protocol.
>> In absolute numbers:
>>
>> bzr revno http://.... -> 2.7 seconds
>> bzr revno bzr://....  -> 2.4 seconds
>> bzr revno file://...  -> 0.75 seconds
>>
>> bzr log http://... -> 187.766 seconds
>> bzr log bzr://.... -> 8.2 seconds
>> bzr log file://... -> 7.6 seconds
> 
> This is a bit surprising. Especially since bzr:// is working well. This isn't
> an https server, but I wonder if the problem is the connection handshaking.
> 
> Actually, you know what.... It is probably refusing the readv() requests. So we
> have to fall back to get() of the whole file. Which means a whole lot of
> downloading.

Yes, it the case. I ran log again with -Dhttp flag and I see that server 
each time respond with full text, not the range bzr requested.

> 
> One way to diagnose this is to see what your network bandwidth is during this
> time. If it is low, then pycurl is being bad about how it is processing the
> requests. If it is high, then it is probably your server refusing to give back
> partial files.
> 
> I know the built-in python HTTP server doesn't support partial files. Do we
> know what server Trac uses?

HTTP server from the standard Python library, I believe, because I don't 
install any additional server libs/software.

> 
> Also, when pycurl succeeds, you should look in ~/.bzr.log for a line like:
> 
>         if self._range_hint == 'multi':
>             self._range_hint = 'single'
>             mutter('Retry "%s" with single range request' % relpath)
>         elif self._range_hint == 'single':
>             self._range_hint = None
>             mutter('Retry "%s" without ranges' % relpath)
>         else:
> 
> So if you see "Retry ... with single range request" or "Retry without ranges",
> then we know that each time we go back for a bit more data, it is downloading
> too much. (Especially if you get 'without ranges').

I provide you the full .bzr.log item, only alter the real URL. There is 
no 'Retry' messages.

> Packs in general are going to perform very poorly in this situation. They were
> designed with the idea of being able to do partial reads. Which makes them much
> faster when you don't have to parse the complete set of all indexes. But much
> slower when requesting the bytes from 10-20 gives you the bytes from 0-100000.

IMO, we should not pack the whole repo in one big pack. For me it more 
reasonable to have some min/max limit to avoid knit situation with 
zillion small files, but also to avoid my bad situation when entire
100+ MB pack file read again and again.