bzr status/commit/revert performance

Joshua Jensen jjensen at workspacewhiz.com
Thu Jan 10 21:03:01 GMT 2008


I have some behavioral questions about Bazaar 1.0 and 1.1 and performance.  I noticed this as I was testing performance with a very large .bzr repository (4 gigabytes).

When I 'bzr add', the dirstate is populated with the new filename information.  Then I 'bzr commit'.  Minor updates are performed to the dirstate, but that is all.  For the large 4 gigabyte repository, this takes 25 minutes.

Then I run 'bzr status' or, alternatively, I run 'bzr commit' again.  This takes much longer than expected, and so I run the Windows utility Filemon from sysinternals.com.  From Filemon, I can see Bazaar scanning the directory structure and reading each file in the tree in its entirety.  When this is done, dirstate has had its SHA1 hashes (I think) populated (I think).  This scan of every file takes 15 minutes.

If I run 'bzr status' or 'bzr commit' again, Bazaar just checks timestamps and is very fast.  For the repository above, it takes around 8 seconds.

So, question #1: Why does Bazaar not generate the SHA1 hashes as it is committing the data in the first place?  It would speed up the subsequent operations.

Related to all this is the performance of 'bzr revert'.  When doing a 'bzr add', I realized I screwed it up and added far too many files.  I ran the 'bzr revert' command and waited and waited and waited.  It seems 'bzr revert' reads the contents of every file in the add list before reverting it.  Bear in mind that these files have not been committed yet.

Question #2: Why does 'bzr revert', for files not yet under its control, not just quickly remove the filename from any .bzr internal tracking lists?

Thanks for the help!

Josh






More information about the bazaar mailing list