[MERGE] Faster 'build_tree'

Thu Jul 26 18:04:33 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:

>> If it was essentially doing this already, it seems questionable to add
>> extra code.  Saving a function call isn't a very big win for these
>> operations.

> Well, compare:
> 
> lines = rt.get_file(file_id).readlines()
> 
> to
> 
> lines = rt.get_file_lines(file_id)

Sure, I can see how avoiding double-handling is helpful there.

But for rt.get_file_text(), you're just choosing where you're doing
''.join().

> So propagating the callers needs down the stack means that the low-level
> implementation can do what in needs to to return it. Rather than having all the
> higher-level apis massaging the data repeatedly.

That's as may be.  To me, it seems pretty silly to have three different
methods to get the content of a file.

And frankly, all of them are wrong:
 - get_file requires you to return a file-like object, which is
   frequently pointless overhead.
 - get_lines() can exhaust memory when dealing with large binary files,
   because the files may not contain \n.
 - get_text() can exhaust memory when dealing with large files

> I don't know any current users of get_file_text() offhand. I'm fine with not
> messing with the function at this point.

I had no idea it existed.

> I believe that it can ultimately be better for us to work in file blocks,
> rather than as strictly lines, except for in places where we actually care
> about lines (like in annotations).

Full ACK.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGqNQh0F+nu1YWqI0RAmLeAJ9WBoixWCH9/J5w14e9goXSyhk03gCfQj6M
dpt5UcSGemiU5IbAph44//E=
=Uwo6
-----END PGP SIGNATURE-----