[MERGE] less-shaing during status
John Arbash Meinel
john at arbash-meinel.com
Thu Sep 25 18:16:28 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> (Refreshed to avoid NEWS conflicts)
>
if (stat_value.st_mtime < self._cutoff_time
- - and stat_value.st_ctime < self._cutoff_time):
+ and stat_value.st_ctime < self._cutoff_time
+ and len(entry[1]) > 1
+ and entry[1][1][0] != 'a'):
+ # Could check for size changes for further optimised
+ # avoidance of sha1's. However the most prominent case of
+ # over-shaing is during initial add, which this catches.
+ link_or_sha1 = self._sha1_file(abspath)
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
executable, packed_stat)
^- I would probably change the "!= 'a'" into "== 'f'". I don't see why
we would cache if it was a rename, or kind change.
I also don't see a harm in
and entry[1][1][1] == stat_value.st_size
But if you've done the profiling, it may be a case we don't need to
check regularly. (It helps the.. I've changed 20 items in my tree, on an
ongoing basis and keep running 'bzr st', but sha-ing 20 items is usually
not an expensive case.)
=== modified file 'bzrlib/dirstate.py'
...
> @@ -2955,9 +2964,22 @@
> if source_minikind != 'f':
> content_change = True
> else:
> - # We could check the size, but we already have the
> - # sha1 hash.
> - content_change = (link_or_sha1 != source_details[1])
> + # If the size is the same, check the sha:
> + if target_details[2] == source_details[2]:
> + if link_or_sha1 is None:
> + # Stat cache miss:
> + file_obj = file(path_info[4], 'rb')
> + try:
> + statvalue = fstat(file_obj.fileno())
> + link_or_sha1 = sha_file(file_obj)
> + finally:
> + file_obj.close()
> + self.state._observed_sha1(entry, link_or_sha1,
> + statvalue)
> + content_change = (link_or_sha1 != source_details[1])
> + else:
> + # Size changed, so must be different
> + content_change = True
> # Target details is updated at update_entry time
> if self.use_filesystem_for_exec:
> # We don't need S_ISREG here, because we are sure
^- I'll note that we have "osutils.sha_file_by_name" which makes the
inner portion of this a bit cleaner. It also uses raw os.open(), which I
think has a bit less overhead (no buffering in python, etc.)
Unless this 'sha_file' is off of the Dirstate object for tesing purposes.
I wonder if we shouldn't add a check that the existing cached value
missed, and the file won't actually be sha'd. Does it actually change
the record in the dirstate (and set the IN_MEMORY_MODIFIED)?
It would mean that for:
bzr add foo
bzr st
We wouldn't have to write out the dirstate again.
Anyway, if you can, use osutils.sha_file_by_name, otherwise
BB:tweak
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjbx2sACgkQJdeBCYSNAANC1ACgiFrdDqWlVk+2lR/kU7Y6/qHr
LgIAnA+gXkscTJRNMA6rj74OnUC/sJFi
=VTIG
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list