MemoryError on commit with large file

Fri Oct 5 15:20:59 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Robert Collins wrote:
>> This will work for most cases, and will address the number of copies
>> problem substantially, but we may still fall down on merge, which is
>> somewhat trickier to reduce memory usage on.
> 
> The main issue with merge is the sequence matching, and this also
> affects diff.
> 
> If we can substitute shorter values for lines (e.g. hashes), we can
> potentially reduce the memory footprint of sequence matching by an order
> of magnitude.
> 
> Aaron
> 

Just a quick comment. Depending on what we are doing, xdelta is also designed
to handle incremental updates. (As is zlib, etc). The internals are a streaming
interface. We would have to expose them (pyxdelta only exposes the
'compress/extract all at once' interface), but it would be possible to do.

That doesn't help us for merging, or showing diffs to the user (which xdelta is
not suitable for). But it would help with commit, and text extraction.

But in general, we would need to have a lot of our codebase updated to handle
the streaming concepts. Though working around the concept of a "text iterator"
might do well enough. (Which could be a list of lines, or a file object, or a
custom object that reads chunks at a time.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBkhKJdeBCYSNAAMRAlWPAJwMho5hzxHI/WI3GeX5f9QdeAgkIgCfftSE
f+/IEmo24Q7fnLPmtDP9jp4=
=jVUR
-----END PGP SIGNATURE-----