[MERGE] less-shaing during status

John Arbash Meinel john at arbash-meinel.com
Thu Sep 25 18:16:28 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> (Refreshed to avoid NEWS conflicts)
> 

         if (stat_value.st_mtime < self._cutoff_time
- -            and stat_value.st_ctime < self._cutoff_time):
+            and stat_value.st_ctime < self._cutoff_time
+            and len(entry[1]) > 1
+            and entry[1][1][0] != 'a'):
+                # Could check for size changes for further optimised
+                # avoidance of sha1's. However the most prominent case of
+                # over-shaing is during initial add, which this catches.
+            link_or_sha1 = self._sha1_file(abspath)
             entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
                            executable, packed_stat)

^- I would probably change the "!= 'a'" into "== 'f'". I don't see why
we would cache if it was a rename, or kind change.

I also don't see a harm in

  and entry[1][1][1] == stat_value.st_size

But if you've done the profiling, it may be a case we don't need to
check regularly. (It helps the.. I've changed 20 items in my tree, on an
ongoing basis and keep running 'bzr st', but sha-ing 20 items is usually
not an expensive case.)


=== modified file 'bzrlib/dirstate.py'
...
> @@ -2955,9 +2964,22 @@
>                      if source_minikind != 'f':
>                          content_change = True
>                      else:
> -                        # We could check the size, but we already have the
> -                        # sha1 hash.
> -                        content_change = (link_or_sha1 != source_details[1])
> +                        # If the size is the same, check the sha:
> +                        if target_details[2] == source_details[2]:
> +                            if link_or_sha1 is None:
> +                                # Stat cache miss:
> +                                file_obj = file(path_info[4], 'rb')
> +                                try:
> +                                    statvalue = fstat(file_obj.fileno())
> +                                    link_or_sha1 = sha_file(file_obj)
> +                                finally:
> +                                    file_obj.close()
> +                                self.state._observed_sha1(entry, link_or_sha1,
> +                                    statvalue)
> +                            content_change = (link_or_sha1 != source_details[1])
> +                        else:
> +                            # Size changed, so must be different
> +                            content_change = True
>                      # Target details is updated at update_entry time
>                      if self.use_filesystem_for_exec:
>                          # We don't need S_ISREG here, because we are sure

^- I'll note that we have "osutils.sha_file_by_name" which makes the
inner portion of this a bit cleaner. It also uses raw os.open(), which I
think has a bit less overhead (no buffering in python, etc.)

Unless this 'sha_file' is off of the Dirstate object for tesing purposes.


I wonder if we shouldn't add a check that the existing cached value
missed, and the file won't actually be sha'd. Does it actually change
the record in the dirstate (and set the IN_MEMORY_MODIFIED)?

It would mean that for:

bzr add foo
bzr st

We wouldn't have to write out the dirstate again.


Anyway, if you can, use osutils.sha_file_by_name, otherwise

BB:tweak

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjbx2sACgkQJdeBCYSNAANC1ACgiFrdDqWlVk+2lR/kU7Y6/qHr
LgIAnA+gXkscTJRNMA6rj74OnUC/sJFi
=VTIG
-----END PGP SIGNATURE-----



More information about the bazaar mailing list