[MERGE] Make pulling bundle 6x faster
Aaron Bentley
aaron.bentley at utoronto.ca
Thu Jul 26 22:34:14 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> Martin Pool has voted +0.
> Status is now: Semi-approved
> Comment:
> This deserves a NEWS entry.
>
> Does this improvement show up in the bundle benchmarks in the benchmark
> suite?
Yes. Here's bzr.dev:
...allBenchmark.test_few_files_big_tree_1_revision OK 6302ms/
10176ms
...hmark.test_few_files_moderate_tree_100_revision OK 931ms/
6635ms
...nchmark.test_few_files_moderate_tree_1_revision OK 169ms/
585ms
...enchmark.test_few_files_small_tree_100_revision OK 359ms/
2059ms
...lBenchmark.test_few_files_small_tree_1_revision OK 17ms/
112ms
...llBenchmark.test_some_files_big_tree_1_revision OK 6303ms/
10024ms
...chmark.test_some_files_moderate_tree_1_revision OK 169ms/
590ms
...iteBenchmark.test_few_files_big_tree_1_revision OK 7583ms/
10250ms
...hmark.test_few_files_moderate_tree_100_revision OK 1237ms/
6467ms
...nchmark.test_few_files_moderate_tree_1_revision OK 269ms/
586ms
...enchmark.test_few_files_small_tree_100_revision OK 617ms/
2074ms
...eBenchmark.test_few_files_small_tree_1_revision OK 31ms/
113ms
...teBenchmark.test_some_files_big_tree_1_revision OK 7846ms/
10487ms
...chmark.test_some_files_moderate_tree_1_revision OK 271ms/
588ms
Here's bzr.mpbundle:
...allBenchmark.test_few_files_big_tree_1_revision OK 5343ms/
9188ms
...hmark.test_few_files_moderate_tree_100_revision OK 483ms/
6078ms
...nchmark.test_few_files_moderate_tree_1_revision OK 163ms/
584ms
...enchmark.test_few_files_small_tree_100_revision OK 210ms/
1934ms
...lBenchmark.test_few_files_small_tree_1_revision OK 14ms/
110ms
...llBenchmark.test_some_files_big_tree_1_revision OK 6389ms/
10102ms
...chmark.test_some_files_moderate_tree_1_revision OK 163ms/
578ms
...iteBenchmark.test_few_files_big_tree_1_revision OK 7350ms/
10180ms
...hmark.test_few_files_moderate_tree_100_revision OK 860ms/
6029ms
...nchmark.test_few_files_moderate_tree_1_revision OK 266ms/
584ms
...enchmark.test_few_files_small_tree_100_revision OK 466ms/
1940ms
...eBenchmark.test_few_files_small_tree_1_revision OK 28ms/
115ms
...teBenchmark.test_some_files_big_tree_1_revision OK 7279ms/
9969ms
...chmark.test_some_files_moderate_tree_1_revision OK 264ms/
581ms
The improvements are most dramatic in the 100-revision cases, but even
the 1-revision cases tend to be slightly improved.
My 6x figure came from a 1000-revision case.
> 'memory_friendly' is not an ideal name because reading this call I think
> 'well why not be friendly?'
Well, I did want to suggest that it was the best default.
> Maybe it should be called 'stream_input'
> or 'decode_entire_file' or 'stream_decompress' etc.
I had the idea that it could be more general than that. There are lots
of places we can make a time/space tradeoff. But perhaps stream_input
makes sense.
> @@ -106,7 +106,7 @@
> if block is None:
> continue
> i, j, n = block
> - while j + n < cur_line:
> + while j + n <= cur_line:
> block = cur_block[p] = next_block(p)
> if block is None:
> break
>
> I guess this is a bug fix? I don't think you mentioned it in your merge
> request? I see you do have the commit message
>
>> Fix benign off-by-one error generating mpdiffs
That's the one.
> Is it really harmless? Maybe it just makes the matched regions too small?
That's right. It prevented us from advancing to the next potential
match, so we would do an insertion instead.
> Can you please add a docstring for left_matching_blocks at least in the
> base class? Also it doesn't seem to be directly tested - it should be,
> since it's in a public interface. Why are they _left_ matching blocks -
> left parent?
They are matching blocks calculated against the left parent.
> If this is just a hint then it can be a bit tricky to test: maybe you
> should pass a value and then hook in to observe that it doesn't redo the
> diff?
I guess I can do that.
> Maybe this shouldn't be a public parameter?
I think that we need to propagate match information more broadly.
Sequence matching has poor scaling behavior. While it can be
accelerated, it's better to avoid it. So I think I'll bite the bullet
and test it.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGqRNW0F+nu1YWqI0RAmoOAJ9tKflC7LLC8HADtwRHKanq5uPr/ACggEIu
ZjAdM2avvjEad5vH7PLtBjY=
=T3YM
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list