[MERGE] sha_file_by_name using raw os files; -Dhashcache

Martin Pool mbp at sourcefrog.net
Mon Oct 8 02:56:50 BST 2007


> Martin Pool wrote:
> > I thought about using mmap or a subprocess, but was dissuaded for the
> > moment by portability.  On some unixes mmap is much more restricted on
> > Linux (eg you can crash if the file is truncated while you're reading
> > it).  On Windows running sha1sum in a subprocess is probably slower.
> > It's quite possible but this just seemed like an incremental
> > improvement.
> >
>
> Well, as both were slower (mmap ~10%, subprocess about 15x slower), there
> really is no question. mmap *might* be faster in an extension. But probably not
> enough to justify it.

I replied to your first mail before I saw the second with numbers
showing they weren't worthwhile.

I don't know everything that's going on, but I recall that changing
mappings causes a TLB and cache invalidation (both for mmap and
munmap).  For small files it's possible this will slow us down much
more than the memory copy.

> Also, how did you realize we were double buffering? strace checking? Or just a
> good hunch?

I just saw it as I was looking through the code in preparation for the
set_state_from_inventory fix.  In general "oh that looks slow, I'll
change it" is not a really good optimization technique but here it was
quick enough to try.  Also it makes things actually a bit clearer.

-- 
Martin



More information about the bazaar mailing list