[MERGE] Fetch tweaks
John Arbash Meinel
john at arbash-meinel.com
Tue Jul 29 03:18:22 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
| On Mon, 2008-07-28 at 10:48 -0500, John Arbash Meinel wrote:
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Robert Collins wrote:
|> | This allows repositories more control over their fetch operations
in the
|> | generic fetching code. Doing so allows the groupcompress format to
avoid
|> | having to figure out full text representations, rather getting
|> | everything as fulltext in the first place; and eliminates an
unnecessary
|> | reconcile post-fetch.
|> |
|> | -Rob
|> |
|>
|> BB:approve
|>
|> I like this patch as it stands, though with one caveat. Specifically
|> during first branch, passing _fetch_uses_deltas = False will read the
|> entire repository into memory. It will be somewhat efficient, in that it
|> will share strings in the in-memory lists, up until the point that you
|> actually fetch a bit of text.
|
| Yes indeed.
|
|> Then it does ''.join(lines) which doubles memory consumption for that
|> text (while still caching the original lines). If the caller doesn't
|> hang onto the text it will probably be ok.
|
| If we had the unpacked sizes in the index we could do something clever
| and simple :). We don't though. (And I think it would be a loss overall
| due to index size increasing.)
|
|> To truly scale up, we need to change the 'get_record_stream()' code that
|> blindly unpacks all of the requested keys so that we only unpack a few
|> at a time. I don't have a good answer for that, as how do you decide how
|> much to unpack for efficiency versus memory consumption.
|>
|> Anyway, this is still better than what we have (as it lets us experiment
|> with it), and it shouldn't change the behavior of anything *today*.
|
| I plan to audit the versioned file code to make sure it will do
| something nice for group compress - or can be tweaked to do so. Roughly
| thats:
| - adding reverse-topological
| - checking knits groups by fileid when sorting
| - making the full text assembly a little bit more lazy (I'm thinking
| just a 100-text batches). Excluding ISO's and so on most texts are <
| 1MB, so that should be less than 100MB worst case.
|
| Another thing we should do in knits is discard the raw content once its
| not referenced anymore; but perhaps gc will make that irrelevant.
|
| -Rob
|
Well, simply doing:
~ text_map.pop(key)
rather than
~ text_map[key]
Will at least let those lines be reclaimed.
I still think getting the streaming interface to work in "chunks" would
be best. A full-text in a string [full_text] is a chunk, as is [lines].
I know it complicates some things, but if we already have to do
'split_lines(text)' we can just write a C implementation to convert
chunks => lines for anyone that needs to do so.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkiOfe4ACgkQJdeBCYSNAAMbwgCdExp646d65utc5YbNIqBNTUnT
tyAAn3lfR2PLEGxdYalQ9eCqRxYucnrU
=RtKz
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list