[MERGE] Repository.get_revision_graph improvements

Robert Collins robertc at robertcollins.net
Tue Feb 26 22:34:22 GMT 2008


On Tue, 2008-02-26 at 16:16 -0600, John Arbash Meinel wrote:
...

Our current scaling performance is due to python internals not disk
layout. Thats what I was getting at. Anyhow re disk format see below for
my current code :).

> By having the hash table entries be fixed size, then you know how much
> space to give it at the beginning of the file. You could also compress
> it, and just reserve enough space to hold it uncompressed. You would
> still have a fixed max size, and just have to read less data when you
> actually go to read the file.

+class SortedGraphIndexBuilder(_GraphIndexBuilder):
+    """A builder that can build a SortedGraphIndex.
+
+    The resulting index has the structure:
+    _SIGNATURE OPTIONS KEYHASH COLLISIONLIST NODES NEWLINE
+    _SIGNATURE     := 'Bazaar Graph Index 2' NEWLINE
+    OPTIONS        := 'node_ref_lists=' DIGITS NEWLINE
+                      'key_elements=' DIGITS NEWLINE
+                      'len=' DIGITS NEWLINE
+                      'hash_bytes=' DIGITS NEWLINE
+                      'offset_bytes=' DIGITS NEWLINE
+                      'node_offset=' DIGITS NEWLINE
+    KEYHASH        := An array of offset_byte length integers in network byte
+                      order. Each entry in the array is the location of the key
+                      that had that hash prefix in the file, and the array is
+                      complete (that is, if hash_bytes=1, there are 256 entries
+                      in the array). In pathological cases we may need to
+                      handle collisions (so that we don't have many gigabytes
+                      of index when two entries hash closely: in those cases
+                      the location pointed at will be in the collision-handling
+                      list. NEWLINE follows the keyhash for debugging ease.
+    COLLISIONLIST  := COLLISION * (in lexographic order)
+    COLLISION      := KEY NULL REFERENCE NEWLINE
+    NODES          := NODE*
+    NODE           := KEY NULL ABSENT? NULL REFERENCES NULL VALUE NEWLINE
+    KEY            := Not-whitespace-utf8
+    ABSENT         := 'a'
+    REFERENCES     := REFERENCE_LIST (TAB REFERENCE_LIST){node_ref_lists - 1}
+    REFERENCE_LIST := (REFERENCE (CR REFERENCE)*)?
+    REFERENCE      := DIGITS  ; digits is the byte offset in the index of the
+                              ; referenced key.
+    VALUE          := no-newline-no-null-bytes
+    """
+
+    _SIGNATURE = _SIGNATURE_2


You should check the branch out sometime :)

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080227/eb79d895/attachment-0001.pgp 


More information about the bazaar mailing list