[MERGE] Make pulling bundle 6x faster

Aaron Bentley aaron.bentley at utoronto.ca
Thu Jul 26 22:34:14 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Martin Pool has voted +0.
> Status is now: Semi-approved
> Comment:
> This deserves a NEWS entry.
> 
> Does this improvement show up in the bundle benchmarks in the benchmark
> suite?

Yes.  Here's bzr.dev:
...allBenchmark.test_few_files_big_tree_1_revision   OK     6302ms/
10176ms
...hmark.test_few_files_moderate_tree_100_revision   OK      931ms/
6635ms
...nchmark.test_few_files_moderate_tree_1_revision   OK      169ms/
585ms
...enchmark.test_few_files_small_tree_100_revision   OK      359ms/
2059ms
...lBenchmark.test_few_files_small_tree_1_revision   OK       17ms/
112ms
...llBenchmark.test_some_files_big_tree_1_revision   OK     6303ms/
10024ms
...chmark.test_some_files_moderate_tree_1_revision   OK      169ms/
590ms
...iteBenchmark.test_few_files_big_tree_1_revision   OK     7583ms/
10250ms
...hmark.test_few_files_moderate_tree_100_revision   OK     1237ms/
6467ms
...nchmark.test_few_files_moderate_tree_1_revision   OK      269ms/
586ms
...enchmark.test_few_files_small_tree_100_revision   OK      617ms/
2074ms
...eBenchmark.test_few_files_small_tree_1_revision   OK       31ms/
113ms
...teBenchmark.test_some_files_big_tree_1_revision   OK     7846ms/
10487ms
...chmark.test_some_files_moderate_tree_1_revision   OK      271ms/
588ms

Here's bzr.mpbundle:
...allBenchmark.test_few_files_big_tree_1_revision   OK     5343ms/
9188ms
...hmark.test_few_files_moderate_tree_100_revision   OK      483ms/
6078ms
...nchmark.test_few_files_moderate_tree_1_revision   OK      163ms/
584ms
...enchmark.test_few_files_small_tree_100_revision   OK      210ms/
1934ms
...lBenchmark.test_few_files_small_tree_1_revision   OK       14ms/
110ms
...llBenchmark.test_some_files_big_tree_1_revision   OK     6389ms/
10102ms
...chmark.test_some_files_moderate_tree_1_revision   OK      163ms/
578ms
...iteBenchmark.test_few_files_big_tree_1_revision   OK     7350ms/
10180ms
...hmark.test_few_files_moderate_tree_100_revision   OK      860ms/
6029ms
...nchmark.test_few_files_moderate_tree_1_revision   OK      266ms/
584ms
...enchmark.test_few_files_small_tree_100_revision   OK      466ms/
1940ms
...eBenchmark.test_few_files_small_tree_1_revision   OK       28ms/
115ms
...teBenchmark.test_some_files_big_tree_1_revision   OK     7279ms/
9969ms
...chmark.test_some_files_moderate_tree_1_revision   OK      264ms/
581ms

The improvements are most dramatic in the 100-revision cases, but even
the 1-revision cases tend to be slightly improved.

My 6x figure came from a 1000-revision case.

> 'memory_friendly' is not an ideal name because reading this call I think
> 'well why not be friendly?'

Well, I did want to suggest that it was the best default.

> Maybe it should be called 'stream_input'
> or 'decode_entire_file' or 'stream_decompress' etc.

I had the idea that it could be more general than that.  There are lots
of places we can make a time/space tradeoff.  But perhaps stream_input
makes sense.

> @@ -106,7 +106,7 @@
>                  if block is None:
>                      continue
>                  i, j, n = block
> -                while j + n < cur_line:
> +                while j + n <= cur_line:
>                      block = cur_block[p] = next_block(p)
>                      if block is None:
>                          break
> 
> I guess this is a bug fix?  I don't think you mentioned it in your merge
> request?  I see you do have the commit message
> 
>> Fix benign off-by-one error generating mpdiffs

That's the one.

> Is it really harmless?  Maybe it just makes the matched regions too small?

That's right.  It prevented us from advancing to the next potential
match, so we would do an insertion instead.

> Can you please add a docstring for left_matching_blocks at least in the
> base class?  Also it doesn't seem to be directly tested - it should be,
> since it's in a public interface.  Why are they _left_ matching blocks -
> left parent?

They are matching blocks calculated against the left parent.

> If this is just a hint then it can be a bit tricky to test: maybe you
> should pass a value and then hook in to observe that it doesn't redo the
> diff?

I guess I can do that.

>  Maybe this shouldn't be a public parameter?

I think that we need to propagate match information more broadly.
Sequence matching has poor scaling behavior.  While it can be
accelerated, it's better to avoid it.  So I think I'll bite the bullet
and test it.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGqRNW0F+nu1YWqI0RAmoOAJ9tKflC7LLC8HADtwRHKanq5uPr/ACggEIu
ZjAdM2avvjEad5vH7PLtBjY=
=T3YM
-----END PGP SIGNATURE-----



More information about the bazaar mailing list