[MERGE/RFC] MemoryTree and a TreeBuilder helper

John Arbash Meinel john at arbash-meinel.com
Thu Sep 7 21:32:26 BST 2006


Robert Collins wrote:
> This is the start of another pass at test-suite performance, its been a
> year since my last such pass.
> 
> This time round my goal is to get closer to, or even reach, having all
> tests that are not testing workingtree specifically stop hitting local
> disk.
> 
> I've introduced MemoryTree which is a MutableTree that caches no data
> and is backed by memory. Common code with WorkingTree has been pulled up
> into the new MutableTree common parent.
> 
> TreeBuilder is where I plan to move a number of test tweaking facilities
> - different styles of building trees etc. I considered doing it as a
> decorator, but decided that for the moment having it just a separate
> object was cleaner. Its probable that benchmarks.tree_create.TreeCreator
> should be a subclass of TreeBuilder, or something like that.
> 
> As to what performance changes I expect - I've only converted one test
> so far, as proof of concept - and that improved by 25%.
> 
> 
> I'm interested in working on this incrementally, I'm going to convert a
> couple of tests a day, extending the support api as needed.
> 
> For now - I'd love to get a +1 on the current code, or feedback
> sufficient to get that.

I think the concept is very good. It would be nice to have the tests run
faster.

> 
> Cheers,
> Rob
> 

...

v- Because we aren't testing it as an API, shouldn't it be private
_MutableTree, which is just an implementation detail?

> +class MutableTree(tree.Tree):
> +    """A MutableTree is a specialisation of Tree which is able to be mutated.
> +
> +    Generally speaking these mutations are only possible within a lock_write
> +    context, and will revert if the lock is broken abnormally - but this cannot
> +    be guaranteed - depending on the exact implementation of the mutable state.
> +
> +    The most common form of Mutable Tree is WorkingTree, see bzrlib.workingtree.
> +    For tests we also have MemoryTree which is a MutableTree whose contents are
> +    entirely in memory.
> +
> +    For now, we are not treating MutableTree as an interface to provide
> +    conformance tests for - rather we are MemoryTree specifically, and 
> +    interface testing implementations of WorkingTree.

^- probably you need "rather we test MemoryTree specifically" ...
...

v- I know in some ways it would be nice to not care about the file kind
until you commit. However, don't you need something so that you can
handle directories versus files, wrt one can contain more files, and the
other can't?
Also, it would change our in-memory Inventory layout, because we would
need a generic InventoryEntry with a parameterized 'kind', rather than
the inventory hierarchy that we have. I believe we want
InventoryDirectory, because we have different ways of doing a snapshot()
etc based on the file kind.

> +        # fill out file kinds for all files [not needed when we stop 
> +        # caring about the instantaneous file kind within a uncommmitted tree
> +        #
> +        self._gather_kinds(files, kinds)
> +        self._add(files, ids, kinds)
> +
> +    def _add(self, files, ids, kinds):
> +        """Helper function for add - updates the inventory."""
> +        raise NotImplementedError(self._add)

...

>      def make_bzrdir(self, relpath, format=None):
>          try:
> -            url = self.get_url(relpath)
> -            mutter('relpath %r => url %r', relpath, url)
> -            segments = url.split('/')
> +            # might be a relative or absolute path
> +            maybe_a_url = self.get_url(relpath)
> +            segments = maybe_a_url.split('/')
> +            t = get_transport(maybe_a_url)

If you are only comparing against the last one, it seems like doing
"base, last_segment = maybe_a_url.rsplit('/', 1)"
Is more what you want.

The rest, I like.

>              if segments and segments[-1] not in ('', '.'):
> -                parent = '/'.join(segments[:-1])
> -                t = get_transport(parent)
>                  try:
> -                    t.mkdir(segments[-1])
> +                    t.mkdir('.')
>                  except errors.FileExists:
>                      pass
>              if format is None:

...

> === modified file 'bzrlib/tree.py'
> --- bzrlib/tree.py	2006-08-30 13:54:51 +0000
> +++ bzrlib/tree.py	2006-09-06 04:37:53 +0000
> @@ -119,6 +119,18 @@
>      def id2path(self, file_id):
>          return self.inventory.id2path(file_id)
>  
> +    def is_control_filename(self, filename):
> +        """True if filename is the name of a control file in this tree.
> +        
> +        :param filename: A filename within the tree. This is a relative path
> +        from the root of this tree.
> +
> +        This is true IF and ONLY IF the filename is part of the meta data
> +        that bzr controls in this tree. I.E. a random .bzr directory placed
> +        on disk will not be a control file for this tree.
> +        """
> +        return self.bzrdir.is_control_filename(filename)
> +

^- Why did you need this here? RevisionTree doesn't have control files.
Maybe you would want it for MutableTree, it just doesn't seem quite
relevant to all Tree objects.

Otherwise is seems good enough to me.

John
=:->



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060907/4113dfe3/attachment.pgp 


More information about the bazaar mailing list