[Merge] Slow socket

Wed Aug 16 00:37:53 BST 2006

Overall it looks rather good. There are a few more cleanups that could
happen. But this could be merged without another review. So if anyone
gets to it, this has my +1 to do a few small cleanups and merge.
Otherwise, if someone gives a +1, I can do the cleanup tomorrow and merge.

Carl Friedrich Bolz wrote:
> I updated the slow socket patch to fix the problems that John and Robert
> pointed out, see attachement.
> 
> John Arbash Meinel wrote:
>> Out of curiosity, what bzr did you use to generate the diff? We haven't
>> used '/dev/null' for a newly added file in a while.
> 
> I used the one in dapper, whichever that is. From now on I am properly
> dogfooding on bzr.dev, though :-).

That should be 0.8.2, I thought it had already changed there. (If you
were on breezy with 0.7, I would understand more). Anyway,....

> 
> [snip]
>>> +        try:
>>> +            tree.commit('initial commit')
>>> +            for i in range(num_commits):
>>> +                fn = files[i % len(files)]
>>> +                f = open(fn, "w")
>>> +                f.write("\n".join([str(j) for j in (range(i) + [i,
> i, i])]))
>>> +                f.close()
>>> +                tree.commit("changing file %s" % fn)
>> ^- We almost always use "wb" to ensure the contents of the file are
>> consistent across platforms.
>> I also don't understand what you are accomplishing with (range(i) +
>> [i,i,i]). I suppose you just want to always create a file with 'i+3'
> lines.
> 
> The intention was to have at least _some_ lines which get removed
> when the file is changed again, and not just more and more added lines.

Okay. I think it is a little odd, but probably fine for manually
generated changes.

...

> Moved this. I was not so sure whether it is of general use, I guess in
> the end more and more benchmarks will use the same convenience functions
> to create trees with certain properties.
> 
> In addition I think the tree creation function should be updated to use
> John's new caching mechanism. Should I update the patch again or do it
> with an extra merge? These particular trees are not very big, so maybe
> it is ok not cache.

100x100 is probably pretty small how long does it take for you to generate?

I think having 'create_with_commits' be cached would be reasonable. But
obviously commit_some_revisions should not.

I can see us wanting to scale up to 1k x 1k, or even 10k x 10k, which we
really would want to cache.

I don't think it is a requirement to get this merged, though.
(Especially since you aren't caching them in some other way, so there
isn't a conflict.)

> 
>> Did we decide that simulated time was better than using real time? (For
>> one thing, async improvements would not show up properly here).
> 
> Right. Although the benchmarks are now even slower, of course.
> 
> [snip]

I'm curious about the accuracy of this. Did you ever try using 'tc' and
comparing the results? Were they somewhat related?

...

>> So is this setting the bandwidth to 8 millibytes/sec?
>>
> 
> I hope it sets the bandwidth to 1 byte/sec, but I might have
> mis-calculated :-). The bandwidth parameter accepts Mbit, which is
> commonly used for internet access and such.

Ok. I didn't know the units. bytes or bits per second seem the most
straightforward since you don't have to do any mental conversions. But
seeing 1.0 and having it mean 1Mbit is nice too.

...

> A __getattr__() is indeed only called when the normal way to lookup a an
> attribute fails. I decided to use a __getattr__ but to route only
> certain attributes through to the underlying socket (see patch). I think
> it is a compromise between having repeated code and the risk of routing
> too much.

Sounds like a good compromise to me.

...

v- This doc string isn't really helpful or accurate. It should be
something more like "Benchmark branch, push and pull across a local sftp
connection"

> +class SFTPBenchmark(Benchmark):
> +    """A benchmark base class that provides a sftp server on localhost."""
> +

v- This is a small thing, but it is probably better to use 'tree' or
'wt' (for working tree) rather than just 't'. Because 't' is also too
often used as Transport or Tree. This is not a blocker, just a
cleanliness and future recommendation thing.

> +    def test_branch(self):
> +        os.mkdir("a")
> +        t, files = self.create_with_commits(100, 100, "a")
> +        self.time(bzrdir.BzrDir.open(self.get_url('a')).sprout, "b")
> +

I think you want to do:
b2 = t.bzrdir.sprout('b')

rather than
rbzrdir.sprout('b')

They should be the same thing in the end, but rbzrdir should have to
sprout over the SFTP connection, while t.bzrdir just uses 2 local
branches. Especially if your SFTP connection has high latency at this
point, there can be a huge difference between the two. And it is test
setup time, not actual benchmark time.

> +    def test_pull_1(self):
> +        os.mkdir("a")
> +        t, files = self.create_with_commits(100, 100, "a")
> +        rbzrdir = bzrdir.BzrDir.open(self.get_url('a'))
> +        b2 = rbzrdir.sprout("b") # branch
> +        # change a few files and commit
> +        self.commit_some_revisions(t, files, 1, 20)
> +        self.time(b2.open_branch().pull, rbzrdir.open_branch())

...

> === modified file 'bzrlib/benchmarks/__init__.py'
> --- bzrlib/benchmarks/__init__.py	2006-08-15 16:25:59 +0000
> +++ bzrlib/benchmarks/__init__.py	2006-08-15 22:26:38 +0000
> @@ -16,7 +16,9 @@
>  
>  """Benchmark test suite for bzr."""
>  
> +import bzrlib.branch
>  from bzrlib import (
> +    bzrdir,
>      plugin,
>      )
>  from bzrlib.tests.TestUtil import TestLoader

I know what Robert said, though I disagree with him.
To me 'from bzrlib import' is (bzrlib, '') not (bzrlib, import). While
and the other is (bzrlib, branch).
Now, your original form was still wrong, because the last line is
(bzrlib, tests, TestUtil). So *I* think it needs to go inbetween:

 from bzrlib import (
+    bzrdir,
     plugin,
     )
+import bzrlib.branch
 from bzrlib.tests.TestUtil import TestLoader

This is something we need to work out among ourselves, so we shouldn't
block on this.

> @@ -113,6 +115,54 @@
>          creator = HeavilyMergedTreeCreator(self, link_bzr=hardlink)
>          return creator.create(root=directory_name)
>  

I think 'directory_name' is a little bit long... I use 'root' or 'dest'
in other locations. Some might not like my variable names either,
though. The only strict requirement is that they be meaningful.

> +    def create_with_commits(self, num_files, num_commits, directory_name='.'):
> +        """Create a tree with many files and many commits.
> +        
> +        :param num_files: number of files to be created
> +        :param num_commits: number of commits in the newly created tree
> +        """
> +        files = ["%s/%s" % (directory_name, i) for i in range(num_files)]
> +        for fn in files:
> +            f = open(fn, "wb")
> +            f.write("some content\n")
> +            f.close()

^- we do have 'self.built_contents()' which you can use instead of
writing your own creation function. Though if you use it, you probably
want to switch to a specific directory first, because it adds the full
path as part of the file content.

If you do keep this, you should change it to:

f = open(fn, 'wb')
try:
  f.write('some content\n')
finally:
  f.close()

If for some strange reason f.write() would raise an exception, it is
possible that f would not be closed, and on Win32 an open file prevents
us from cleaning up the test directory.

> +        tree = bzrdir.BzrDir.create_standalone_workingtree(directory_name)

^- we also have self.make_branch_and_tree(directory_name) as a helper
function for the Test suite.

Most of the helpers are part of TestCaseWithTransport, in
bzrlib/tests/__init__.py, so you might look there for other useful
functions.

> +        for i in range(num_files):
> +            tree.add(str(i))

tree.add() can take a list of files, which means it doesn't have to
regenerate a new inventory for every file, it can do it all at once. So
you really want:

tree.add([str(i) for i in xrange(num_files)])

Is there a reason you put all the files in 1 directory? I think the
common numbers I've heard is ~10 files per directory (at least for
kernel trees). I would say something between 10 and 100 myself. So if
your max size is 100, obviously no issue. But if we scale to 1k or 10k
files, we might want to do something else.

> +        tree.lock_write()
> +        try:
> +            tree.commit('initial commit')
> +            for i in range(num_commits):
> +                fn = files[i % len(files)]
> +                f = open(fn, "wb")
> +                content = range(i) + [i, i, i, ""]
> +                f.write("\n".join([str(i) for i in content]))
> +                f.close()
> +                tree.commit("changing file %s" % fn)
> +        finally:
> +            tree.unlock()
> +        return tree, files

^- Do I understand correctly that you modify exactly 1 file per commit?
Probably fine, but should probably be part of the doc string for this
function. You also need another try/finally here.

> +
> +    def commit_some_revisions(self, tree, files, num_commits,
> +                              changes_per_commit):
> +        """Commit a specified number of revisions to some files in a tree,
> +        makeing a specified number of changes per commit.
> +
> +        :param tree: The tree in which the changes happen.
> +        :param files: The list of files where changes should occur.
> +        :param num_commits: The number of commits
> +        :param changes_per_commit: The number of files that are touched in 
> +        each commit.
> +        """
> +        for j in range(num_commits):
> +            for i in range(changes_per_commit):
> +                fn = files[(i + j) % changes_per_commit]
> +                f = open(fn, "w")
> +                content = range(i) + [i, i, i, '']
> +                f.write("\n".join([str(k) for k in content]))
> +                f.close()
> +            tree.commit("new revision")
> +
>  
>  def test_suite():
>      """Build and return a TestSuite which contains benchmark tests only."""
> @@ -129,6 +179,7 @@
>                     'bzrlib.benchmarks.bench_status',
>                     'bzrlib.benchmarks.bench_transform',
>                     'bzrlib.benchmarks.bench_workingtree',
> +                   'bzrlib.benchmarks.bench_sftp',
>                     ]
>      suite = TestLoader().loadTestsFromModuleNames(testmod_names) 
>  
> 
> === modified file 'bzrlib/tests/test_sftp_transport.py'
> --- bzrlib/tests/test_sftp_transport.py	2006-08-11 15:00:12 +0000
> +++ bzrlib/tests/test_sftp_transport.py	2006-08-15 22:04:49 +0000
> @@ -17,6 +17,7 @@
>  import os
>  import socket
>  import threading
> +import time
>  
>  import bzrlib.bzrdir as bzrdir
>  import bzrlib.errors as errors
> @@ -31,22 +32,26 @@
>  except ImportError:
>      paramiko_loaded = False

-- still need an extra blank line here
>  
> +def set_transport(testcase):
> +    """A helper to set transports on test case instances."""
> +    from bzrlib.transport.sftp import SFTPAbsoluteServer, SFTPHomeDirServer
> +    if getattr(testcase, '_get_remote_is_absolute', None) is None:
> +        testcase._get_remote_is_absolute = True
> +    if testcase._get_remote_is_absolute:
> +        testcase.transport_server = SFTPAbsoluteServer
> +    else:
> +        testcase.transport_server = SFTPHomeDirServer
> +    testcase.transport_readonly_server = bzrlib.transport.http.HttpServer
> +

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060815/bd363c49/attachment.pgp