Experience upgrading to 1.9

Thu Jan 8 17:45:42 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I also just upgraded my local and server repositories to format 1.9.
Mostly I was waiting for LP to finally support it.

Martin Albisetti wrote:
> Hello hello,
> 
> I wanted to drop a quick email with my experience upgrading 170+
> branches from 0.92 to 1.9 format:
> 
> - Disk space usage is down by ~15-20%

This may just be due to clearing out the obsolete_packs/* files and the
effecting packing done by converting. (It isn't the same as issuing 'bzr
pack', but it does leave you with a single pack file.)

That said, the index files do get smaller. Or if you are upgrading from
a Knits format. (My bzr knits repo was 125MB, down to 110MB in 1.9 format).

> - Most operations seem faster, but I don't have any hard numbers yet

Again, packing helps a lot here. The 1.9 format is generally about
*remote* performance. It may or may not be much faster for local
performance. (It has to decompress index content rather than just
reading text, but that can be beneficial when things have not been paged
in yet by the OS because less needs to be paged in. Once in mem, though,
raw text is faster.)

> - Upgrading was *very* fast in comparison to knits -> packs

It could have been even faster if you used the
"contrib/convert_to_1.9.py" script on the repository first. Not a big
deal, but it makes a difference when dealing with *huge* repositories.
(Basically, 0.92 and 1.9 are only different in the index format, the
generic 'upgrade' just does a fetch of all the data, while
convert_to_1.9 just changes the indexes.) On the flip side,
convert_to_1.9 doesn't trigger a pack.

> - I used a plugin[1] written by jelmer which adds a --recurse option
> to "bzr upgrade", which I tweaked[2] to add a flag to delete the
> backup.bzr dir right after upgrade (didn't have enough disk space to
> have all repos twice)

Again, 'convert_to_1.9.py' doesn't copy the repository data, just the
index. Though it doesn't have an option to clean up later.

That said, I converted the repo and then did:

find . -path '*.bzr/branch/format' | sed -e 's#/.bzr/branch/format##' \
  | xargs -n1 bzr upgrade --1.9

I could have used "bzr branches" instead of the find, but it is rather
slow, and bzrtools on my server was out-of-date so it just decided to
fail on me rather than trying, and I didn't feel like upgrading just yet.

It took a bit longer to upgrade the 478 branches than to convert the
repo. Though I'm pretty sure that is because I was spawning bzr each time.

I should also mention that this fails for any 'loom' format branches,
though for my work I was ok just skipping them.

> - Some of the branches are over 2 years old, and have gone through
> quite a few upgrades and operating systems. I found that some of them
> had files that blew up in bzr because of encoding issues on the
> filenames, so I made a dirty hack to ignore them so I could do the
> mass-upgrade. I haven't followed up on what's causing the encoding
> problem, but I do know that the files are on the working tree, that
> most of them where committed at some point and now ignored, and that
> they have letters with accents.
> 

Do you happen to be on Mac? My first guess would be a change in the
normalization logic. (Whether we allow/disallow/auto-convert/etc Unicode
normalization following different rules.) Mac OS forces filenames to
conform to their normalization (NFD), which is generally not like the
default of other systems (NFC). NFD uses combining chars, so å on Mac is
actually 2 characters, a plain 'a' and then a 'circle on top'.

If you use that same set on Windows, you end up with "a[]", because
Explorer doesn't understand combining chars. (It allows it, just doesn't
have a way to display it.) On Linux, you get a slightly different visual
between the two.

Further, in the same vein as Case insensitivity, on Mac you can't have
both u'a\u030a' and u'\xe5' in the same directory. (accessing u'\xe5'
will actually give you the u'a\u030a' file, just like accessing 'Foo'
would give you 'foo' if it existed.) Both Windows and Linux will let you
have both files present (though I seem to remember that depending on
your OEM [locale] settings Windows will do weird things, similar to
case-insensitivity).

Anyway, at one point we tried to 'play nice' and cover over how Mac did
things, and translate things under the covers into the NFC forms. Once
dirstate landed, I lost the will to try and maintain it, because the
overall Unicode universe is complex and implementations are not very
consistent, and it wasn't worth my time anymore trying to recover from
all the various edge cases.

> 
> [1] http://people.samba.org/bzr/jelmer/bzr-recursive-upgrade/trunk/
> [2] https://code.launchpad.net/~beuno/bzr/upgrade-recurse
> [3] https://code.launchpad.net/~beuno/+junk/bzr-recurse
> 

Another interesting point. After converting my public server, I went to
Launchpad and had it resume mirroring a bunch of branches that it had
"given up" on. Mostly because of network timeouts, IIRC. (It takes a
while to copy all of that data each time.)

However, because it is now in 1.9 format (>=1.6), Launchpad actually
mirrors them as stacked branches. And since many of them have landed in
bzr.dev, it has no repository data that actually needs to be transmitted.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklmO8YACgkQJdeBCYSNAANVOACglDTxqUNdwieoHvSXmhC24+xW
h4IAnjFNPqinF1tFewjMoZAcuNQzx2oS
=JVpK
-----END PGP SIGNATURE-----