On Wed, Nov 16, 2011 at 7:43 PM, Aaron Bentley <span dir="ltr"><<a href="mailto:aaron@aaronbentley.com">aaron@aaronbentley.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> <div class="im">On 11-11-16 12:03 PM, Marco Pantaleoni wrote:></div><div class="im"> > Why is the whole file needed in memory at once?<br> <br> </div>In general, it's not. We just have some code paths that do it even<br> though they don't need to.<br> <br> There are a few places where this is trickier, like comparing two<br> versions of a file. Diffs can theoretically match any line of a file<br> against any other line of another version, so they need fast access to<br> every line of both versions. Even that can be addressed by say, using<br> hashes of the lines instead of the actual lines, but that is an<br> algorithm change.<br></blockquote><div><br></div><div>line comparison should be needed only on text files, which usually are quite small.</div><div>Maybe we could have two separate code paths for text files and binary ones.</div> <div><br></div><div>Marco</div><div><br></div></div>-- <br>Marco Pantaleoni<br><br>