[MERGE][1.8?] Make 'bzr co' more memory sensitive

Fri Oct 10 21:17:58 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:

...

> 
> I don't really know how to write a correct test (that maximum memory is
> constrained).
> 
> We could ....
> 
> 1) Use a feature to see if /proc/PID/status is available (which is what
> -Dmemory uses).
> 
> 2) Create a 'sparse file'. Either on disk or in a fake MemoryTransport,
> which could return a file with 10MB of un-interesting lines (all letter
> a, etc). And then commit many of them.
> 
> 3) Then do a checkout of 100 10MB files and assert that peak memory
> consumption is < 100MB. (If we unpacked everything at once, it would
> peak at 1GB). (run_bzr_subprocess('bzr co -Dmemory') and assert that the
> output has "VmPeak XXXX" with a value less than 100MB.
> 
> By using trivially compressible files, they shouldn't take up a lot of
> disk space, and the in-memory constraint is also trivial, as you can
> re-use the same string so it doesn't take much to *create* the file.
> 
> It feels a bit... ugly, but it isn't the worst way to do it.

So there is a specific problem with using 'bzr co' to do this. Which is
that it requires 1GB of disk space when we are done. It also takes a
rather large about of time to compress 1GB of data (even though it
compresses very well), and it takes a similar long time to compute the
sha1sum of 1GB of data.

So the test is fairly slow (30+s on a single test is a bit much, and
here we are probably talking minutes?)

The other problem is that there is a considerable amount of overhead in
just loading bzr itself. If I do "bzr co --lightweight bzrtools", it is
18.5MB *with* my patch, and 22MB without my patch. Which is only 4MB
difference for a 1MB source tree.

It takes 100+ MB to checkout 'jam-integration'. I think the bulk of this
is in the index layer, because we are probably having to read the whole
index when you are hitting all 800 files and all of their important
histories. With the patch, it drops down to 68MB.

Also, while I re-use the same string, we aren't very efficient in
various layers, because we end up adding a plain text, but splitting it
in multiple locations into a list of lines, etc.

In the end, though, I did put together this test:

def test_maximum_memory_consumption(self):
    self.requireFeature(tests.ProcPidStatusFeature)
    self.make_repository('.', shared=True)
    builder = self.make_branch_builder('bigfiles')
    a_line = 'a'*999 + '\n'
    lines = [a_line] * 1000
    text = ''.join(lines)
    del lines

    # 'text' is now ~1MB, and trivially compressible
    rev = 1
    builder.start_series()
    builder.build_snapshot('rev001', None,
        [('add', ('', 'tree-root', 'directory', None)),
         ('add', ('file001', 'file001-id', 'file', text))])
    last_rev = 'rev001'
    for i in xrange(2, 101):
        next_rev = 'rev%03d' % (i,)
        fname = 'file%03d' % (i,)
        fid = 'file%03d-id' % (i,)
        builder.build_snapshot(next_rev, [last_rev],
            [('add', (fname, fid, 'file', text))])
        last_rev = next_rev
    # We now have 100 * 1MB == 100MB of stuff to check out
    builder.finish_series()

    out, err = self.run_bzr_subprocess('co . -Dmemory',
                                       working_dir='bigfiles')
    peak = None
    for line in err.splitlines():
        if line.startswith('VmPeak'):
            peak = line.split(' ', 1)[1].strip()
            break
    self.assertIsNot(None, peak, "Failed to find the peak working memory.")
    # So we know how to parse it
    self.assertEndsWith(peak, 'kB')
    mem_consumed = int(peak[:-2]) * 1024
    self.assertTrue(mem_consumed < 50*1024*1024,
                    "We used more than 50MB to checkout a working tree."
                    " This should be more memory constrained than that.")

On the system I tested on, it only takes 20MB to checkout with the
patch. It takes about 60s to run the test on my fast machine, which is a
bit too long. My biggest complaint is that it takes 100MB of disk space
for the 'checkout', and approx 120MB in memory while it works.

Without the patch, it consumes 120MB to do the checkout, so I feel that
there is a good safety margin between 20MB actual and 120MB without the
patch (and picking a threshold of 50MB is generous.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjvuHYACgkQJdeBCYSNAAMYSQCglb+P9+Oa2GdM45s2Thg1SHQA
dw0AoKZhbFNSlzzqQ2DbVmUoTuOBvQaH
=ZZGI
-----END PGP SIGNATURE-----