performance analysis of commit

John Arbash Meinel john at arbash-meinel.com
Thu May 10 15:57:49 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:

...

> 
>> We may need to run hooks or generate signatures during commit, but
>> they don't seem to have substantial performance consequences.
> 
> Generating a signature requires a manifest of the tree; thats O(tree
> size) always (as opposed to O(size of named paths)). 

At least, the way we do signatures now, that is true.

With a recursively-updated structure (where if a file changes there is
an update to its directory), you really only need a signature of the
top-level object.

> 
>> If one wanted to optimize solely for the speed of commit I think
>> hash-addressed  file-per-text storage like in git (or bzr 0.1) is very
>> good.  
> 
> Agreed.

Well, it does have the property that you have to have read the file
before you know its filename, but you could always write it to a
temporary file, and then rename it once you have figured it out.

> 
>> Remarkably, it does not need to read the inventory for the
>> previous revision. For each versioned file, we just need to get its
>> hash, either by reading the file or validating its stat data.
> 
> Well we do have per-file merge graphs that we maintain. While its true
> that this does not imply reading the inventories for the previous
> revisions, it does imply some historical-referencing of data.
> 
>> Variations on this are possible.  Rather than writing a single file
>> into the repository for each text, we could fold them into a single
>> collation or pack file.  That would create a smaller number of files
>> in the repository, but looking up a single text would require looking
>> into their indexes rather than just asking the filesystem.
>>
>> Rather than using hashes we can use file-id/rev-id pairs as at
>> present, which has several consequences pro and con.
> 
> Or a combination: index by hash, name by file-id:rev-id. I think we
> should examine this closely in London.
> 
> This analysis looks very nice and extremely useful to have. It would be
> good to drill into the IO and API implications a bit more.
> 
> Rob

Agreed.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGQzLtJdeBCYSNAAMRAlxuAKDHfX5IdqEI3I+bqZHP29NGcIsYsQCaAogp
hNMHATeEmYxx6vyDNH5yBbc=
=uHJx
-----END PGP SIGNATURE-----



More information about the bazaar mailing list