[MERGE] Repository.get_revision_graph improvements
John Arbash Meinel
john at arbash-meinel.com
Tue Feb 26 22:42:30 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Tue, 2008-02-26 at 16:16 -0600, John Arbash Meinel wrote:
> ...
>
> Our current scaling performance is due to python internals not disk
> layout. Thats what I was getting at. Anyhow re disk format see below for
> my current code :).
Well, if you mean having packs and indexes yes, though the specific
contents of the indexes is causing part of the scaling problems.
>
>> By having the hash table entries be fixed size, then you know how much
>> space to give it at the beginning of the file. You could also compress
>> it, and just reserve enough space to hold it uncompressed. You would
>> still have a fixed max size, and just have to read less data when you
>> actually go to read the file.
>
> +class SortedGraphIndexBuilder(_GraphIndexBuilder):
> + """A builder that can build a SortedGraphIndex.
> +
> + The resulting index has the structure:
> + _SIGNATURE OPTIONS KEYHASH COLLISIONLIST NODES NEWLINE
> + _SIGNATURE := 'Bazaar Graph Index 2' NEWLINE
> + OPTIONS := 'node_ref_lists=' DIGITS NEWLINE
> + 'key_elements=' DIGITS NEWLINE
> + 'len=' DIGITS NEWLINE
> + 'hash_bytes=' DIGITS NEWLINE
> + 'offset_bytes=' DIGITS NEWLINE
> + 'node_offset=' DIGITS NEWLINE
> + KEYHASH := An array of offset_byte length integers in network byte
> + order. Each entry in the array is the location of the key
> + that had that hash prefix in the file, and the array is
> + complete (that is, if hash_bytes=1, there are 256 entries
> + in the array). In pathological cases we may need to
> + handle collisions (so that we don't have many gigabytes
> + of index when two entries hash closely: in those cases
> + the location pointed at will be in the collision-handling
> + list. NEWLINE follows the keyhash for debugging ease.
> + COLLISIONLIST := COLLISION * (in lexographic order)
> + COLLISION := KEY NULL REFERENCE NEWLINE
> + NODES := NODE*
> + NODE := KEY NULL ABSENT? NULL REFERENCES NULL VALUE NEWLINE
> + KEY := Not-whitespace-utf8
> + ABSENT := 'a'
> + REFERENCES := REFERENCE_LIST (TAB REFERENCE_LIST){node_ref_lists - 1}
> + REFERENCE_LIST := (REFERENCE (CR REFERENCE)*)?
> + REFERENCE := DIGITS ; digits is the byte offset in the index of the
> + ; referenced key.
> + VALUE := no-newline-no-null-bytes
> + """
> +
> + _SIGNATURE = _SIGNATURE_2
>
>
> You should check the branch out sometime :)
>
> -Rob
>
Sounds like something I wouldn't mind looking at. Though you haven't
given any pointers as to where it exists yet. Just that "I have a branch
that".
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHxJXWJdeBCYSNAAMRApnPAJ9Kp6XgK4W6ETymIbwFt1hnjwWjmACggDow
EIZ/OPne6Y032nvOz58Fom8=
=zk1q
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list