deleting InterDiffferingSerializer
John Arbash Meinel
john at arbash-meinel.com
Tue May 5 04:31:35 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Mon, 2009-05-04 at 10:17 -0500, John Arbash Meinel wrote:
>
>> There are many things that InterDifferingSerializer gives us that
>> streaming fetch doesn't yet. If we can get streaming fetch to do it,
>> then I'm fine getting rid of it.
>>
>> 1) Incremental updates. IDS converts batches of 100 revs at a time,
>> which also triggers autopacks at 1k revs. Streaming fetch is currently
>> an all-or-nothing, which isn't appropriate (IMO) for conversions.
>> Consider that conversion can take *days*, it is important to have
>> something that can be stopped and resumed.
>
> OTOH the way we convert is:
> do many groups
> do a full repack at the end
>
> If streaming issued data in the right order we could go straight into
> the final pack.
I don't think we can predict what the "right order" will be for the CHK
pages without actually creating them. You are probably correct for the
file texts case.
...
>> 3) "delta based", this is another rather big issue. IIRC Streaming Fetch
>> will send whole-inventories.
>
> Andrew and I have a patch to send inventory deltas, its incomplete
> pending fixing the rather severe stacking-streaming issues.
"incomplete pending fixing" is a reason not to delete IDS yet. I'm happy
to re-evaluate that when you land this.
>
>> 4) Computing rich root information from the 'stream'. IIRC the
>> "Streaming" code does yet-another pass over the inventory data (I think
>> that makes 3) to determine the root-id and whether it has changed or not.
>
> IDS does this too, and the delta based stream won't need to do that.
>
IDS does not. The streaming code does:
def _generate_root_texts(self, revs):
...
if self._rich_root_upgrade():
import bzrlib.fetch
return bzrlib.fetch.Inter1and2Helper(
self.from_repository).generate_root_texts(revs)
And inside of Inter1and2Helper there are things like:
while revs:
for tree in self.source.revision_trees(revs[:100]):
if tree.inventory.revision_id is None:
tree.inventory.revision_id = tree.get_revision_id()
yield tree
Inter1and2Helper extracts *every* revision tree an extra time, and thus
extracts the Inventory information for every revision, which is already
the #1 bottleneck for conversions.
...
This is what I explicitly avoided in InterDifferingSerializer. Basically
by building up the root map while iterating the revision trees the first
time.
> I think the key one is being able to stop and restart. While you say its
> important, I don't think its at all obvious to users how to do this: bzr
> branch old new preserves format unless they init a repo first; if they
> have done this, hitting ctrl-C will abort with no indicator that it is
> partial.
Making it 2x slower because of Inter1and2Helper isn't particularly
advantageous either.
>
> And our builtin 'upgrade' command is definitely not incremental.
>
> Anyhow, I'll leave it alone at least until we have the deltas and other
> stuff in place; I'll bring up the issue again later.
>
> -Rob
Sure. I honestly would like to have 1 code path. Just the "stream" code
is very inefficient for a local operation. It currently passes over the
inventory 2 times, (3 times with rich-root conversion), deals in full
inventories rather than deltas, etc. I'm not entirely sure that the
'optimal delta' generation in IDS is suitable for a Stream code. I
suppose it would make the stream smaller over the wire as well...
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkn/sxcACgkQJdeBCYSNAANI1QCbB6iVPZPeKLVabIzwFbxkPTR1
2pUAnjKv9vUPJdmwymSVvamEexFoJZw2
=bP9u
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list