[MERGE] Repository.get_revision_graph improvements

John Arbash Meinel john at arbash-meinel.com
Tue Feb 26 22:42:30 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Tue, 2008-02-26 at 16:16 -0600, John Arbash Meinel wrote:
> ...
> 
> Our current scaling performance is due to python internals not disk
> layout. Thats what I was getting at. Anyhow re disk format see below for
> my current code :).

Well, if you mean having packs and indexes yes, though the specific
contents of the indexes is causing part of the scaling problems.

> 
>> By having the hash table entries be fixed size, then you know how much
>> space to give it at the beginning of the file. You could also compress
>> it, and just reserve enough space to hold it uncompressed. You would
>> still have a fixed max size, and just have to read less data when you
>> actually go to read the file.
> 
> +class SortedGraphIndexBuilder(_GraphIndexBuilder):
> +    """A builder that can build a SortedGraphIndex.
> +
> +    The resulting index has the structure:
> +    _SIGNATURE OPTIONS KEYHASH COLLISIONLIST NODES NEWLINE
> +    _SIGNATURE     := 'Bazaar Graph Index 2' NEWLINE
> +    OPTIONS        := 'node_ref_lists=' DIGITS NEWLINE
> +                      'key_elements=' DIGITS NEWLINE
> +                      'len=' DIGITS NEWLINE
> +                      'hash_bytes=' DIGITS NEWLINE
> +                      'offset_bytes=' DIGITS NEWLINE
> +                      'node_offset=' DIGITS NEWLINE
> +    KEYHASH        := An array of offset_byte length integers in network byte
> +                      order. Each entry in the array is the location of the key
> +                      that had that hash prefix in the file, and the array is
> +                      complete (that is, if hash_bytes=1, there are 256 entries
> +                      in the array). In pathological cases we may need to
> +                      handle collisions (so that we don't have many gigabytes
> +                      of index when two entries hash closely: in those cases
> +                      the location pointed at will be in the collision-handling
> +                      list. NEWLINE follows the keyhash for debugging ease.
> +    COLLISIONLIST  := COLLISION * (in lexographic order)
> +    COLLISION      := KEY NULL REFERENCE NEWLINE
> +    NODES          := NODE*
> +    NODE           := KEY NULL ABSENT? NULL REFERENCES NULL VALUE NEWLINE
> +    KEY            := Not-whitespace-utf8
> +    ABSENT         := 'a'
> +    REFERENCES     := REFERENCE_LIST (TAB REFERENCE_LIST){node_ref_lists - 1}
> +    REFERENCE_LIST := (REFERENCE (CR REFERENCE)*)?
> +    REFERENCE      := DIGITS  ; digits is the byte offset in the index of the
> +                              ; referenced key.
> +    VALUE          := no-newline-no-null-bytes
> +    """
> +
> +    _SIGNATURE = _SIGNATURE_2
> 
> 
> You should check the branch out sometime :)
> 
> -Rob
> 

Sounds like something I wouldn't mind looking at. Though you haven't
given any pointers as to where it exists yet. Just that "I have a branch
that".

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHxJXWJdeBCYSNAAMRApnPAJ9Kp6XgK4W6ETymIbwFt1hnjwWjmACggDow
EIZ/OPne6Y032nvOz58Fom8=
=zk1q
-----END PGP SIGNATURE-----



More information about the bazaar mailing list