RFC: get_record_stream and groupcompress
Robert Collins
robertc at robertcollins.net
Thu Aug 14 22:45:52 BST 2008
On Thu, 2008-08-14 at 15:34 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > So I've a thorny little tradeoff to encapsulate in code.
> >
> > The best approach I've thought of to date is a parameter to
> > VF.get_record_stream called 'sort_is_hint'. If you think that that
> > sounds fine and dandy, say so now and skip the detail below.
> >
>
> It is okay, but I'm not sure it is the best way. Why can't gc and packs
> just pass 'unordered' if the sort is a hint anyway.
its not a hint for knits or weaves; its correctness for them. knit based
packs don't strictly care [though they do remember what is being
inserted in case it's compression basis is not satisfied]. If its a hint
it doesn't stop it being _useful_.
> > We could insert in any order from remote repositories, this would tend
> > to convert from packs poorly, but a 'pack' operation would fix things
> > up. Fetching from a gc repository over SFTP or the smart server would
> > tend to behave well because they would preserve as much ordering as the
> > base repository has.
>
> That is the route I would take.
This feels like its going to end up with a LOT of uses going 'my
repository got bigger; I thought this was an upgrade'. And that will
confuse people, and also be rather nasty for interoperation with knits,
knitpacks etc.
> > We could buffer to a local store - for instance we could do like git
> > does some form of file-per-object store [but still using atomic
> > insertion to make data visible to other readers] and then compress that
> > store.
>
> Sounds like a lot of work to avoid doing "bzr pack".
The difference is that pack has to consider the whole repo; doing a
short term buffer operation (heck even a totally uncompressed temporary
pack) would only consider transmitted data, so be O(change) not
O(history).
> My feeling is to cross the bridge when we get there. Let's get basic
> groupcompress working.
It is.
> Get fetch working such that it doesn't transmit
> all full-texts and require a full recompression of everything.
Half done. In fact this is why I'm working on the ordering issue.
> And then get a "bzr pack" that can figure out how to put everything
> "optimally".
Sure.
> I realized the other day that "topo_sort" *doesn't* guarantee grouping
> across the whole file_id.
Yes, thats why the new sort order is called reverse_topo_grouped,
because it wants to be grouped by file id.
> It would also be really neat if we could find a way to do appropriate
> cross-file grouping. I can't really think of much from just a file-id
> stand-point, though. Given file size, we might try to insert large ones
> first. Though I would be careful to not insert all the large files into
> one chunk, and have only tiny ones for the next.
basename is likely the best hint to use for that; I would use it to
group files named the same, with all of a given id then all of the next
etc. But we don't currently have that pushed down to the text layer.
> So I think the hint is a nice thing, but I feel it is a bit premature.
I don't :).
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080815/b3f1aa17/attachment.pgp
More information about the bazaar
mailing list