[MERGE] pre_commit hook (was: [RFC] pre_commit hook)

Wed Aug 15 15:52:13 BST 2007

On 8/15/07, John Arbash Meinel <john at arbash-meinel.com> wrote:
> John Arbash Meinel has voted approve.
> Status is now: Semi-approved
> Comment:
> To start with, I think this is reasonable as is, I just have a few
> comments to discuss.
>
> +        if change != 'unchanged' and path != '':
> +            self.affected_ids.setdefault(change, []).append(ie.file_id)
>
> ^- I wonder if it makes more sense to fill the dictionary with the
> different types pointing to empty lists at the start. That way we don't
> have to worry about setdefault().

I'm worry that as changes aren't fixed yet, and there is no single
place to find all inventory entry change types, a hard coded list of
change types here is not welcome. When change types are fixed,
initializing the dictionary before using it is definitely the way to
go.

> I'm just thinking that an initial add of 10,000 files is going to have a
> bit of overhead here. Though I suppose it should be profiled before we
> worry about it.
> But since there are only a few possible cases, it seems easier to do it
> that way.
>
> +    def test_pre_commit_ids(self):
> +        self.build_tree(['rootfile', 'dir/', 'dir/subfile'])
> +        tree = self.make_branch_and_tree('.')
> +        tree.lock_write()
> +        tree.set_root_id('root_id')
> +        tree.add('rootfile', 'rootfile_id')
> +        tree.put_file_bytes_non_atomic('rootfile_id', 'abc')
> +        tree.add('dir', 'dir_id')
> +        tree.add('dir/subfile', 'dir_subfile_id')
> +        tree.put_file_bytes_non_atomic('dir_subfile_id', 'def')
> +        Branch.hooks.install_hook("pre_commit",
> self.capture_pre_commit_hook)
> +        rev1 = tree.commit('first revision')
> +        tree.unversion(['dir_id'])
> +        rev2 = tree.commit('second revision')
> +        tree.put_file_bytes_non_atomic('rootfile_id', 'ghi')
> +        rev3 = tree.commit('third revision')
> +        tree.unlock()
> +        tree.lock_write()
> +        tree.rename_one('rootfile', 'renamed')
> +        rev4 = tree.commit('fourth revision')
> +        tree.unlock()
> +        tree.lock_write()
> +        tree.put_file_bytes_non_atomic('rootfile_id', 'jkl')
> +        tree.rename_one('renamed', 'rootfile')
> +        rev5 = tree.commit('fifth revision')
> +        tree.unlock()
> +        self.assertEqual([
> +            ('pre_commit', 0, NULL_REVISION, 1, rev1,
> +             {'added': ['dir_id', 'dir_subfile_id', 'rootfile_id']} ),
> +            ('pre_commit', 1, rev1, 2, rev2,
> +             {'deleted': ['dir_id', 'dir_subfile_id']} ),
> +            ('pre_commit', 2, rev2, 3, rev3,
> +             {'modified': ['rootfile_id']} ),
> +            ('pre_commit', 3, rev3, 4, rev4,
> +             {'renamed': ['rootfile_id']} ),
> +            ('pre_commit', 4, rev4, 5, rev5,
> +             {'modified and renamed': ['rootfile_id']} )
> +            ],
> +            self.hook_calls)
>
> ^- I would have preferred these as separate tests, because as it stands
> the test is a little bit hard to read.

But that would result in more duplicated code. As it is right now, all
change types share a same imaginary working session.

> Also, you really should have try/finally for:
> tree.lock_write():
> try:
>    do stuff
> finally:
>    tree.unlock()
>
> Otherwise, if something fails in the middle, you will get a mixed
> traceback, when it also warns that the tree wasn't unlocked. Oh, and if
> you are going to do them one after the other, you should be able to just
> lock the tree across the whole set. No need to keep unlocking it. (If
> there is, let us know, because that would be a sign of a problem).

In fact, I was surprised to find unlocking is required. If you remove
the unlock/lock lines in the middle, there will be errors regarding
dirty flag or some thing.

>
> +            ('pre_commit', 4, rev4, 5, rev5,
> +             {'modified and renamed': ['rootfile_id']} )
> +            ],
>
> ^- This one specifically makes me wonder if we want to be using the
> simple "change" value. 'modified', 'renamed', 'added', 'deleted' seem
> reasonable, 'modified and renamed' seems a little weird to use as a key.

IIRC, there is a comment in InventoryEntry that says we are going to
have an enumeration of change types. Until then, this does sound
weird.

> Not enough to say we shouldn't do it. But just a little.
>
> Anyway, these are just thoughts, not anything that I would merge
> without. Someone else can decide if they want to be more strict about
> it. Good job overall.

I'm just scratching my itch. Thank you for guiding me along.

Cheers
Nam