[MERGE] Refactor commit to prepare for population by tree walking

Aaron Bentley aaron.bentley at utoronto.ca
Fri Jul 6 05:17:16 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Aaron Bentley wrote:
>>> The
>>> CommitBuilder code needs the SHA value in order to test for some special
>>> cases - 'changed' isn't enough.
>> That doesn't sound right to me.  I can see how it would want the SHA in
>> order to record it, but not in order to test for special cases.  Could
>> you explain further?
> 
> Sure. Here's the snippet of code from modified_file_text() in
> repository.py (line 2026 or so):
> 
>          # special case to avoid diffing on renames or
>          # reparenting
>          if (len(file_parents) == 1
>             and text_sha1 == file_parents.values()[0].text_sha1
>             and text_size == file_parents.values()[0].text_size):

^^^ it seems to me that this condition can be written as:

   if (len(file_parents) == 1 and not changed)

You may potentially miss cases where the kind changes but the sha1 does
not.  I think that's a corner case of a corner case.

>             previous_ie = file_parents.values()[0]
>             versionedfile = self.repository.weave_store.get_weave(file_id,
>                 self.repository.get_transaction())
>             versionedfile.clone_text(self._new_revision_id,
>                 previous_ie.revision, file_parents.keys())
>             return text_sha1, text_size

^^^ Of course, you do need the text_sha1 to return the text_sha1 :-)

> 
> The SHA is currently set earlier inside _read_tree_state() in
> inventory.py and used for _unchanged() inside the InventoryFile class. I
> can refactor the inventory code so that _unchanged() doesn't need to be
> called because iter_changes tells me that. I don't think I can skip the
> logic in modified_file_text() though?

Looks like you can, to me, except for having to return the SHA1.

> Right now, the profiler is telling me that the lookup of SHA and
> executable flag inside InventoryFile._read_tree_state() is a big part of
> performance. I can avoid those lookups if I have iter_commitable passing
> them to me if the working tree (dirstate) knows them.

Well, looking up a known value based on a stat seems like it should be
stupid-fast, especially when you have the WT locked.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjcJM0F+nu1YWqI0RApviAJ9CQwCTBH40G4nq80JUyS5Id5AIIQCggfD5
l/8Mr12aaJik7lomtVP1Dy8=
=B1Lf
-----END PGP SIGNATURE-----



More information about the bazaar mailing list