BigString (reducing peak memory)

Mon Nov 14 12:23:26 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Martin-

I saw your BigString branch, and I think it is a neat idea, but I
don't think it will actually get you what you want. I started writing
a really long email, but I've spent too much time already and it isn't
very cohesive. I have some good ideas about how we can restructure the
code to decrease peak memory if you want to chat about it, but maybe a
phone call would be better.

If the goal is just peak memory reduction, our current peak memory
during commit is: 1 fulltext + 2 zlib compressed texts. We have 2
copies of compressed content due to api friction. Specifically:
  1) bzrlib.repofmt.pack_repo._DirectPackAccess.add_raw_records and
  2) bzrlib.pack.ContainerWriter.add_bytes_record
  3) bzrlib.pack.ContainerSerializer.bytes_record

They both take 'bytes', rather than 'chunks'. Which means that we take
the raw content, zlib.compress it into a series of (usually one small,
and one very large) zlib chunks, and we have another header string.

bzrlib.groupcompress.GroupCompressVersionedFiles._insert_record_stream
has this code:
    # TODO: Push 'chunks' down into the _access api, so that we don't
    #       have to double compressed memory here
...
    bytes = ''.join(chunks)
    del chunks
    index, start, length = self._access.add_raw_records(
        [(None, len(bytes))], bytes)[0]

We could get rid of that peak-memory if we had a way to pass the
[groupcompress_block_header, zlib, zcontent, zchunks] instead.

I didn't bother yet, because ContainerWriter adds its own headers and
joins them again in bytes_record.

We do that because we didn't want to have 5 syscalls to write out
content to disk (or over the network). However, it would be
interesting to have something that would 'join_if_small()' so that all
the header strings could be combined into a single write, followed by
a single write of the actual content.

If the goal is to migrate the code to not holding a fulltext content
in memory at any time, I think it is possible, though pretty tricky.
Some of the abstractions are close to being in place, though.

 1) bzrlib.groupcompress.PyrexGroupCompressor and
    bzrlib._groupcompress_pyx.DeltaIndex

    These create a new bytestream that includes some 'header' bytes and
    some actual content bytes (and delta record bytes, though we don't
    delta during commit.)

    However, they already work in '.chunks'. So you have something like:
    ['f', '\x20', raw_file_content, 'd', '\x15', delta_records, ...]
     ^- this is a fulltext
          ^- length of fulltext in base128
                  ^- raw content
                                    ^- this is a delta
                                         ^- of base128 length
               and the copy/insert instructions -^

     DeltaIndex also tracks multiple 'source_info' structures, pointing
     at things like the 'raw_file_content' mentioned above.

     What you could do next is to break up large single-file content
     into multiple source sections, so you would have:

     ['f', '\x20', first_1MB, second_1MB, ...]

     You would lose the ability to have a 'copy' record that crossed a
     Source boundary, but I don't think that is a huge loss, as long as
     the individual sections are still large.
     Also, the maximum copy instruction is 64KB anyway. So it takes 32
     copy instructions to match exactly 2MB, making it 33 because of a
     boundary doesn't seem bad.

  2) Update the code so that these chunks can be written to disk,
     instead of persisted in memory. We still make use of them to find
     matches, though, so we still want access to the raw text. We could
     do our own caching, and play around with a big file and an LRU
     Cache of Source sections that we've matched against.
     Or we could write all of .chunks to disk and mmap it, allowing the
     OS to do the LRU/readahead/etc for us.
     mmap doesn't allow us to break the 32-bit barrier, though it may be
     conceptually simpler.
     I suppose with mmap64 we could still do an LRU cache trick, and
     have it allow us to keep some-subset of a >2GB file mapped into
     memory at any one moment.
     However, a given group compress block doesn't support >4GB,
     because our copy records are limited to a 32-bit offset-from-start.
     But as we tend to fail when committing a 1GB file today, pushing
     that boundary up would be an improvement.

  3) Update the ContentFactory code so that we can indicate when we are
     done with a given text, and it can be freed from memory. Writing
     the chunks in (1) to disk doesn't help us if the ContentFactory
     that handed us those chunks is still holding it in memory.

     Care will need to be taken though, as some content streams are
     re-used in a pipeline. Specifically, chk streams get parsed as
     CHKMap entries to give us the referenced file-texts, while they
     also are used as bytes-to-be-delta-compressed. (Because we redelta
     on the fly during fetch for poorly delta'd content.)

  4) Update _LazyGroupCompressFactory so that it can return real chunks
     rather than always a fulltext. Further, we'll want a way to stage
     a GroupCompressBlock content to disk, and be able to extract and
     apply a delta record without holding the whole block in memory.
     When reading off the network, we'll need a place to write the
     bytes if they get too big in their fully compacted form.
     Then we'll also likely want a way to zlib.decompress() the block
     into a temp file, and then extract 'chunked' content as we apply
     the recipe.

     I fully support the idea of disabling redelta-on-the-fly for
     large blocks, so in theory we could stream the data right of the
     network into the upload/temp.pack file.

  5) We'll also want something like (4) when creating WT contents.

  6) We'll need to audit our code a bit. We support chunks in many
     apis, but they generally expect it to be a list of strings, not an
     iterable of strings. So we have to be careful for when we can stop
     holding the chunks in memory, otherwise we haven't gained anything.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7BCD4ACgkQJdeBCYSNAAOyfACdH9J2l4O4e0BzefWWBXn1uyQ9
x2wAnj239LUeYQJ0+ejePabqgCiMIZx/
=H07u
-----END PGP SIGNATURE-----