BigString (reducing peak memory)

Wed Nov 16 18:43:58 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-11-16 12:03 PM, Marco Pantaleoni wrote:
> On Wed, Nov 16, 2011 at 5:49 PM, Aaron Bentley
> <aaron at aaronbentley.com Not really.  The point is that the data
> needs to be dealt with in smaller chunks, whether they're read()
> from a file directly or iterated through from a generator.  Dealing
> with the interface difference between files and iterables is the
> easy bit.  It's avoiding reading entire files at once that seems to
> be problematic.
> 
> 
> Why is the whole file needed in memory at once?

In general, it's not.  We just have some code paths that do it even
though they don't need to.

There are a few places where this is trickier, like comparing two
versions of a file.  Diffs can theoretically match any line of a file
against any other line of another version, so they need fast access to
every line of both versions.  Even that can be addressed by say, using
hashes of the lines instead of the actual lines, but that is an
algorithm change.

> If it is read into a string, it would be quite easy to create a 
> "virtual" string handling "paging" of the file. This would provide
> the same functionality as mmap(), but without the 32 bit or
> OS-specific limitations.

That's an interesting thought.  It always seems like a shame to
reinvent virtual memory, but perhaps that would be convenient here.

> It would be different if it is needed to handle writes, where the 
> written file is expected to be handled by an unknown consumer (a
> pipe for example).

I believe all our writes are simple append operations.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7EBG0ACgkQ0F+nu1YWqI1RywCfTahVZCxquoTK+BfH2nLEWG6t
NRsAn20OlBfULOfwbRgbp1U/lIc0uLCv
=oZsC
-----END PGP SIGNATURE-----