[MERGE] first round of use case benchmarks

Mon Aug 21 21:50:05 BST 2006

Carl Friedrich Bolz wrote:
> Hi all!
> 
> John Arbash Meinel wrote:
>> Carl Friedrich Bolz wrote:
>>> We updated the merge request (see attachment) for use case benchmarks to
>>> also include lower level bundle benchmarks (so far reading and writing
>>> only). We also implemented a caching mechanism, which caches trees
>>> across test runs in ~/.bazaar/devtemp. This is meant for developers only
>>> and is disabled by default.
>>>
>>> Cheers,
>>>
>>> Carl Friedrich
>>>
>> I'm not sure why you implemented a caching mechanism, after I mentioned
>> that I was implementing one... For sanity, though, I might just merge
>> yours instead of mine.
>>
>> There are still a bunch of PEP8 violations (2 spaces between top-level
>> functions/classes is the big one).
> 
> I refactored the benchmarks to use the new caching mechanisms and to use
> the tree creation functions that were merged into bzr.dev today (the
> slow-socket ones).
> 
> Cheers,
> 
> Carl Friedrich
> 
> 

...

v- 'c' comes before 'o'
> +
> +import os
> +import shutil
> +from cStringIO import StringIO
> +

v- 'from bzrlib import' comes before 'from bzrlib.add'
> +from bzrlib.add import smart_add
> +from bzrlib import bzrdir
> +from bzrlib.benchmarks import Benchmark
> +from bzrlib.branch import Branch
> +from bzrlib.bundle import read_bundle
> +from bzrlib.bundle.serializer import write_bundle
> +from bzrlib.revisionspec import RevisionSpec
> +from bzrlib.workingtree import WorkingTree
> +
> +

v- Technically, you should have a 1-line sentence as the first part of
the docstring. But I know how hard it is to summarize in only 80 chars.
But more importantly, the docstring is a comment, not a discussion of
this class. It is especially bogus, because you actually implement the
comment later on.

> +class BundleBenchmark(Benchmark):
> +    """The bundle tests should (also) be done at a lower level with
> +    direct call to the bzrlib.
> +    """
> +   

...

v- You shouldn't create anything above '.'. Also, reading in the string
just to read it in and write it to a file is a little bit ugly. I think
this would be better:

from bzrlib.bundle.serializer import write_bundle

tree = self.make_kernel_like_tree_committed('tree')

f = open('bundle', 'wb')
try:
    write_bundle(tree.branch.repository, tree.last_revision(),
		 NULL_REVISION, f)
finally:
    f.close()

tree2 = self.make_branch_and_tree('branch_a')
os.chdir('branch_a')
self.time(self.run_bzr, 'merge', '../bundle')

> +    def test_apply_bundle_known_kernel_like_tree(self):
> +        """Create a bundle for a kernel sized tree with no ignored, unknowns,
> +        or added and one commit.
> +        """ 
> +        self.make_kernel_like_tree_committed()
> +        f = file('../bundle', 'wb')
> +        try:
> +            f.write(self.run_bzr('bundle', '--revision', '..-1')[0])
> +        finally:
> +            f.close()
> +        self.run_bzr("init", "../branch_a")
> +        os.chdir('../branch_a')
> +        self.time(self.run_bzr, 'merge', '../bundle')
> +
> + 

v- It is actually quite fast to 'read' a bundle. The expensive part is
adding it to the local repository. So I would probably change this to:

from bzrlib.bundle.apply_bundle import install_bundle

def _time_read_write(self, branch):
  bundle_text = StringIO()
  self.time(write_bundle, branch.repository, branch.last_revision(),
	    NULL_REVISION, bundle_text)
  bundle_text.seek(0)
  target_tree = self.make_branch_and_tree('b')
  bundle = self.time(read_bundle, bundle_text)
  self.time(install_bundle, target_tree.branch.repository, bundle)

> +class BundleLibraryLevelBenchmark(Benchmark):
> +
> +    def _time_read_write(self):
> +        branch, relpath = Branch.open_containing("a")
> +        revision_history = branch.revision_history()
> +        bundle_text = StringIO()
> +        self.time(write_bundle, branch.repository, revision_history[-1],
> +                  None, bundle_text)
> +        bundle_text.seek(0)
> +        self.time(read_bundle, bundle_text)

...
The permutations seem reasonable, though.
I wonder if it would be better to separate the timing. 1 time evaluated
for generating the bundle, and another time for reading and installing
the bundle. Though that ends up doubling the number of tests you write.

...

v- Seems okay. Ultimately we probably would want to test one of your
'make_many_commit_tree()' trees as well.

> +class InfoBenchmark(Benchmark):
> +    """This is a stub. Use this benchmark with a network transport.
> +    Currently "bzr info sftp://..." takes > 4 min"""
> +
> +    def test_no_ignored_unknown_kernel_like_tree(self):
> +        """Info in a kernel sized tree with no ignored or unknowns. """
> +        self.make_kernel_like_added_tree()
> +        self.time(self.run_bzr, 'info')
> +
> +    def test_no_changes_known_kernel_like_tree(self):
> +        """Info in a kernel sized tree with no ignored, unknowns, or added.""" 
> +        self.make_kernel_like_committed_tree()
> +        self.time(self.run_bzr, 'info')

...

v-- You ned to set the tree name, so that it includes the
parameterization. Otherwise you will end up using the same directory for
each variation of the many commit tree.

So you need:

tree_name='many_files_many_commit_tree_%d_%d' % (num_files, num_commits)

> +class ManyCommitTreeCreator(TreeCreator):
> +    """Create an tree many files and many commits."""
> +
> +    def __init__(self, test, link_bzr=False, num_files=10, num_commits=10):
> +        super(ManyCommitTreeCreator, self).__init__(test,
> +            tree_name='many_files_many_commit_tree',
> +            link_bzr=link_bzr,
> +            link_working=False,
> +            hot_cache=True)

v- Why do you create this here, and then create it again later on. And
later on you create it 2 times one with a prefix, and one without.

> +        self.files = ["%s" % (i, ) for i in range(num_files)]
> +        self.num_files = num_files
> +        self.num_commits = num_commits
> +
> +    def _create_tree(self, root, in_cache=False):
> +        num_files = self.num_files
> +        num_commits = self.num_commits

If you already have 'self.files' I think it would work better to do:

files = [root + '/' + fn for fn in self.files]

Either that or:
files = ['%s/%s' % (root, fn) for fn in self.files]

> +        files = ["%s/%s" % (root, i) for i in range(num_files)]
> +        for fn in files:
> +            f = open(fn, "wb")
> +            try:
> +                f.write("some content\n")
> +            finally:
> +                f.close()
> +        tree = bzrdir.BzrDir.create_standalone_workingtree(root)
> +        tree.add([str(i) for i in range(num_files)])
> +        tree.lock_write()
> +        try:
> +            tree.commit('initial commit')
> +            for i in range(num_commits):
> +                fn = files[i % len(files)]
> +                content = range(i) + [i, i, i, ""]
> +                f = open(fn, "wb")
> +                try:
> +                    f.write("\n".join([str(i) for i in content]))
> +                finally:
> +                    f.close()
> +                tree.commit("changing file %s" % fn)
> +        finally:
> +            tree.unlock()
> +        self.files = ["%s" % (i, ) for i in range(num_files)]
> +        return tree
> +
> 

...

The rest seems pretty good.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060821/b9af211e/attachment.pgp