[PATCH] Changeset plugin passes all tests

Sun Jul 17 16:44:19 BST 2005

It would be nice, but I think it's impossible.  The Inventory has the
same kind of mean, petty invariants as as POSIX filesystems.  Things
like 'you can't have two files with the same pathname' and 'you can't
stick a file in a parent directory that doesn't exist.'  When two files
swap names, for example, you have to do at least three renames (foo ->
temp, bar -> foo, tmp -> bar).

The merge code handles this by renaming every file (that need renaming
at all) twice: once to a temp name, once to the new name.  But the merge
code also just generates the desired final inventory, and then sticks it
into the tree, instead of going through all that complication a second time.

> To me, it would be nice
> if every call to ChangesetTree.inventory did not have to completely
> regenerate the tree each time.

Agreed.  On the other hand, maybe the inventory isn't generated often
enough to matter.

>>The current inventory attribute is essentially a cache of ChangesetTree
>>data.  However, this cache is not invalidated when the tree-construction
>>methods are called (e.g. ChangesetTree.note_rename()).
> 
> It is *sort-of* a cache. Because it is a property, every time it is
> accessed, the function is run, and it re-creates the inventory. So it
> really is invalidated, as long as you don't copy the inventory, and then
> hang on to it.

Oh.  Err.  I guess I saw what I was expecting to see.  It looked like
you were going for the typical lazy-instantiation idiom:

class Foo:
    def__init__(self):
        self._bar = None
    def _get_bar(self):
        if self._bar is None:
            self._bar = self.generate_bar()
        return self._bar
    bar = property(_get_bar)

>>While it would be slower to use ChangesetTree as its own inventory, I
>>don't know how much slower it would be.  The difference might well be
>>negligible.
> 
> 
> I'm sure it out-performs completely rebuilding the inventory every time
> it is requested. :)

Yeah, probably it does.  On the one hand, it would be a premature
optimization.  On the other, we have all these test cases now, so we'd
know if we did it wrong.

> By the way, I think it would be good for a branch to cache it's root id.
> 
> If you want, we can cache it, and then make sure it is either updated or
> invalidated on any "_write_inventory" access. It doesn't prevent having
> 2 branches open on the same path, and having one update the working
> tree, and the second isn't aware of it.
> I can't see of any way to handle that other than either singletons or
> constantly re-reading all branch attributes

We could probably implement Branch.read_working_inventory as

statdata = os.stat(inventory_file)
if statdata.st_mtime != self._cached_inv_mtime:
    # only remember mtimes in the past, so we handle multiple changes
    # per second.
    if statdata.st_mtime != time():
        self._cached_inv_mtime = statdata.st_mtime
    else:
        self._cached_inv_mtime = None
    self._cached_inventory = self._read_working_inventory()
return self._cached_inventory

This would handle more cases than just the root id.  It would allow for
two processes modifying the same branch.  It would re-read when it was
uncertain whether the inventory had changed, as well as when it
definitely had changed.  It does cost one os.stat system call, but
nothing's perfect.

> and singletons were frowned upon in the past.

Singletons definitely have value.  Sorry if I gave the impression I they
were bad in principle.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC2nzT0F+nu1YWqI0RAhBmAJ9uJWEYf5T74FJQANaaKF5i74kpCwCfUWq6
qnKbF56NTgrDP+TfWJ3JmDc=
=zgft
-----END PGP SIGNATURE-----