deleting InterDiffferingSerializer

John Arbash Meinel john at arbash-meinel.com
Mon May 4 16:17:00 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> 2009/5/4 Robert Collins <robert.collins at canonical.com>:
>> I'd like to delete this class? Any objections?
>>
>> @John, I know you want fast conversions to dev6, so do I. I think we
>> have to simply put the work into doing that in streaming fetch, or they
>> will only be fast locally.
>>
>> It's caused bugs in the past, and continues to do so - most recently bug
>> 368921. It's not invoked when converting from a smart server; and its
>> not needed to perform conversions anymore, as we have that built into
>> the streaming fetch support.
>>
>> bug 368921 *may* have additional issues in the streaming fetch, but
>> right now we have two implementations of the same problem that don't
>> agree with each other about the outputs.
> 
> +1 to deleting it if John thinks it's reasonable.
> 

There are many things that InterDifferingSerializer gives us that
streaming fetch doesn't yet. If we can get streaming fetch to do it,
then I'm fine getting rid of it.

1) Incremental updates. IDS converts batches of 100 revs at a time,
which also triggers autopacks at 1k revs. Streaming fetch is currently
an all-or-nothing, which isn't appropriate (IMO) for conversions.
Consider that conversion can take *days*, it is important to have
something that can be stopped and resumed.

2) Also, auto-packing as you go avoids the case you ran into, where bzr
bloats to 2.4GB before packing back to 25MB. We know the new format is
even more sensitive to packing efficiency. Not to mention that a single
big-stream generates a single large pack, it isn't directly obvious that
we are being so inefficient.

3) "delta based", this is another rather big issue. IIRC Streaming Fetch
will send whole-inventories. IDS also spends a reasonable amount of time
computing multiple deltas to find the most optimal one. This was
important for conversions, as it generates the least amount of churn on
chk pages. Sending whole-inventories is pretty much the worst-case
scenario, as it causes us to generate *every* chk page from scratch,
serialize it, sha1 sum it, to end up determining it is a duplicate.

4) Computing rich root information from the 'stream'. IIRC the
"Streaming" code does yet-another pass over the inventory data (I think
that makes 3) to determine the root-id and whether it has changed or not.



So if you have the time to implement all of this for the streaming code,
by all means get rid of IDS. But it isn't like it is a trivial fix to
the smart fetch code to get it to provide all the benefits that we have
IDS for *today*. 1 bug is easily traded off the 4 benefits above, IMO.

I fully agree that maintaining multiple code paths is crummy. But the
streaming code doesn't lend itself easily to most of the things I
described above.

So -1 for just removing IDS, without addressing at least some of the above.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkn/BuwACgkQJdeBCYSNAAPK3QCglRAijB13uy5lfm2zd9V9R/4H
M30An2VdqLmvg4mauMf4nLzrDu+N2YGl
=b1E6
-----END PGP SIGNATURE-----



More information about the bazaar mailing list