brisbane:CHKMap.iteritems() tweaks
John Arbash Meinel
john at arbash-meinel.com
Wed Mar 25 02:33:56 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> So in short - the recent content we need is not at the front of the
> groups?
>
> -Rob
Correct. If you consider the chk layout
root
/ | \
A B C
Over revisions 1-6
And then those get packed into a group as:
R1 R2 R3 R4 R5 R6 # texts at a different level are in their own group
A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 B5 B6 C1 C2 C3 C4 C5
If that is all one group, then we have to decompress() all of A2-A6 to
get the content of B1.
The 'recent' patch that I put together changes this to
R1 R2
A1 A2 B1 B2 C1 C2
R3 R4 R5 R6
A3 A4 A5 A6 B3 B4 B5 B6 C3 C4 C5
So, 2 groups.
I did propose changing the chk grouping to be pure
R1 R2 R3 R4 R5 R6
A1 A2 A3 A4 A5 A6
B1 B2 B3 B4 B5 B6
C1 C2 C3 C4 C5
But to do that easily, I would need to create more streams. Also, when I
was trying to tweak this, the initial results showed things being worse
for file texts. So it may need some other work to try to tweak the
compress code to know what it is compressing, so that it know it is
dealing with file texts, which would have a different expected behavior
than chk texts, etc.
File texts are sometimes 'similar but not quite exact' chk texts are
pretty much always either very similar or very different. I guess I
could see getting a small amount of 'revision_id' correspondence between
unmatched pages, but all the sha1 sums and file_ids would be different.
I'll poke at it a bit. But anyway, the same thing holds true for the
file texts. Because it was one of the ways that we got better
compression. (I'm guessing it is cross-file compression of stuff like
copyright headers.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknJmBQACgkQJdeBCYSNAAMC0ACgpaZTVHQZ0j5a2Uvjg1rgfTZj
+WsAoMMTmd2ad54O40wFWnsiIOJN95kW
=LZXa
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list