calling for dirstate dogfooders, and change in branch policy now all tests pass

John Arbash Meinel john at arbash-meinel.com
Fri Mar 2 15:16:40 GMT 2007


Robert Collins wrote:
> On Thu, 2007-03-01 at 09:10 -0600, John Arbash Meinel wrote:
> 
> 
>>>  - Figure out the regression on xml based trees (???) (I am guessing its
>>> a hashcache usage issue - should be simple)
>> I've done profiling, and I'm pretty convinced this is not genuine. When
>> I did a 'bzr --lsprof status' I had a difference of about 40ms out of 1s
>> but bzr.dev was actually the slower one.
>>
>> Running 'timeit bzr status' on the launchpad tree I have 1.93s with
>> bzr.dev, and 1.97s with dirstate. 40ms is well within the noise margin.
>> Or at least not worth spending a lot of time yet.  (running it again I
>> got 1.87s for dirstate)
> 
> With mozilla I have these three times:
> 616 seconds: XML, bzr.dirstate 
> 11 seconds: XML, /usr/bin/bzr
> 7 seconds: dirstate, bzr.dirstate
> 
> *something* is wrong.

I figured it out, as well as figuring out why our timings didn't agree.
The difference is in doing "bzr status PATH" versus plain "bzr status".

The first form grabs file ids and then passes that to
iter_entries_by_dir. (It also first uses file_ids_across_trees, which
means it grabs a lot of file ids if you are using the root path).

With my 'safe_file_id' changes, I accidentally converted the set() into
a list(). And obviously doing "if x in foo:" for a 55,000 entry list is
much slower than a 55,000 entry set.

The fix is pretty trivial:
=== modified file 'bzrlib/inventory.py'
--- bzrlib/inventory.py 2007-02-18 00:22:24 +0000
+++ bzrlib/inventory.py 2007-03-02 15:13:24 +0000
@@ -960,8 +960,8 @@
         :return: This yields (path, entry) pairs
         """
         if specific_file_ids:
-            specific_file_ids = [osutils.safe_file_id(fid)
-                                 for fid in specific_file_ids]
+            specific_file_ids = set(osutils.safe_file_id(fid)
+                                 for fid in specific_file_ids)
         # TODO? Perhaps this should return the from_dir so that the root is
         # yielded? or maybe an option?
         if from_dir is None:


I'll post a more complete fix in a little bit. Because we have other
performance issues with all of the revision id and file id conversions
that are going on. I'm going to turn the 'safe_*' functions into
emitting warnings, but assuming that the utf8 forms are correct (right
now each call was decoding and then re-encoding to make sure they were
valid utf8).

And there are a couple other small things I want to do.

John
=:->




More information about the bazaar mailing list