Bazaar, OLPC, and the "no-packs" repository format.
Lukáš Lalinský
lalinsky at gmail.com
Thu Dec 20 15:16:48 GMT 2007
On Št, 2007-12-20 at 08:57 -0600, John Arbash Meinel wrote:
> > Various logs were my primary tests, since I was playing with this
> > because of the per-file log performance bug I've reported. To get all
> > revisions it had to seek so many times that even though I had faster
> > index than with knits, the whole operation is slower than with knits.
>
> I'm also wondering if the requests were getting properly generated. Such that
> we should be requesting several revisions at a time, and sending that down to
> the lower layers, which are then allowed to reorder the request in whatever
> fashion is fastest for them.
>
> I would be interested in seeing some of your --lsprof results.
>
> ...
I don't have them anymore, but I can convert some branches and try
repeat the tests. But the most expensive operations were seeking and
wrapping/unwrapping the result to/from StringIO for the pack container
reader.
> I see it by looking at the parse_header_py function:
> def _parse_header_py(hash_to_segment, data, pos, offset, segment_count):
> for i in xrange(segment_count):
> key_count, size = struct.unpack('<BI', data[pos:pos+5])
> pos += 5
> hashes_size = key_count * 4
> hashes = struct.unpack('<%di' % key_count, data[pos:pos+hashes_size])
> pos += hashes_size
> for key_hash in hashes:
> hash_to_segment.setdefault(key_hash, []).append(i)
> yield offset, size, key_count
> offset += size
>
> So you describe the segments in the header as just the list of hashes present.
> So you have:
>
> HEADER = OVERALL_DESCRIP SEGMENT_DESCRIP*
> OVERALL_DESCRIP = NUM_REF_LISTS KEY_LENGTH KEY_COUNT SEGMENT_COUNT INDEX_SIZE
> SEGMENT_DESCRIP = NUM_KEYS SEGMENT_LENGTH HASH_KEY*
>
> I'm guessing some of your variable names aren't quite right.
>
> Like why is it called "_key_length" but you are passing it to parse_header as
> "offset".
Actually, the python code is not in sync with the C code which I used as
the primary version (it was just an experiment to see if I can make
packs faster just by making the indexing layer fast, not a serious
code).
> And I think "index_size" just refers to the hash map section.
Right.
> It also looks like your code only supports local operations. Certainly doing
> "self._transport.get(fname)" isn't a great thing to do on HTTP. Instead you
> should be doing:
>
> bytes = self._transport.readv(fname, [(0, 1024)], adjust_for_latency=True
> upper_limit=XXXX)
Yes, it was optimized only for local operations, for which calling readv
multiple times and opening the file every time was significantly slower.
Lukas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Toto je =?ISO-8859-1?Q?digit=E1lne?=
=?ISO-8859-1?Q?_podp=EDsan=E1?= =?UTF-8?Q?_=C4=8Das=C5=A5?=
=?ISO-8859-1?Q?_spr=E1vy?=
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071220/985faadd/attachment.pgp
More information about the bazaar
mailing list