Bazaar, OLPC, and the "no-packs" repository format.

John Arbash Meinel john at arbash-meinel.com
Thu Dec 20 14:57:59 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Various logs were my primary tests, since I was playing with this
> because of the per-file log performance bug I've reported. To get all
> revisions it had to seek so many times that even though I had faster
> index than with knits, the whole operation is slower than with knits.

I'm also wondering if the requests were getting properly generated. Such that
we should be requesting several revisions at a time, and sending that down to
the lower layers, which are then allowed to reorder the request in whatever
fashion is fastest for them.

I would be interested in seeing some of your --lsprof results.

...

> 
> Ok, the way you said it sounded like you have a pointer of just the tip
> revision in the global hash map. Instead you have a hash of every key pointing
> to the start of the segment they are in.
> 
> How do you know how long the segments are? Is that stored in the index as well.
> 
> I suppose I should look at your code a bit more.
> 
> John
> =:->
> 

I see it by looking at the parse_header_py function:
def _parse_header_py(hash_to_segment, data, pos, offset, segment_count):
    for i in xrange(segment_count):
        key_count, size = struct.unpack('<BI', data[pos:pos+5])
        pos += 5
        hashes_size = key_count * 4
        hashes = struct.unpack('<%di' % key_count, data[pos:pos+hashes_size])
        pos += hashes_size
        for key_hash in hashes:
            hash_to_segment.setdefault(key_hash, []).append(i)
        yield offset, size, key_count
        offset += size

So you describe the segments in the header as just the list of hashes present.
So you have:

HEADER = OVERALL_DESCRIP SEGMENT_DESCRIP*
OVERALL_DESCRIP = NUM_REF_LISTS KEY_LENGTH KEY_COUNT SEGMENT_COUNT INDEX_SIZE
SEGMENT_DESCRIP = NUM_KEYS SEGMENT_LENGTH HASH_KEY*

I'm guessing some of your variable names aren't quite right.

Like why is it called "_key_length" but you are passing it to parse_header as
"offset".
And I think "index_size" just refers to the hash map section.

It also looks like your code only supports local operations. Certainly doing
"self._transport.get(fname)" isn't a great thing to do on HTTP. Instead you
should be doing:

 bytes = self._transport.readv(fname, [(0, 1024)], adjust_for_latency=True
			       upper_limit=XXXX)



John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHaoL2JdeBCYSNAAMRAuDCAJ4/5t/z3rqNQqg5Ao3fXU1NxsDNdwCfbFCd
rKwCpVxwBl45XatM8KGiYs4=
=2V4x
-----END PGP SIGNATURE-----



More information about the bazaar mailing list