Rev 4161: make sha1_provider a parameter to DirState() in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Wed Mar 18 03:45:40 GMT 2009
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 4161
revision-id: pqm at pqm.ubuntu.com-20090318034536-m78yf86gruh3qvh3
parent: pqm at pqm.ubuntu.com-20090318021431-md1n8o3542wwsvai
parent: ian.clatworthy at canonical.com-20090318030204-rjrcdyzi51obr8ln
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2009-03-18 03:45:36 +0000
message:
make sha1_provider a parameter to DirState()
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/_dirstate_helpers_c.pyx dirstate_helpers.pyx-20070503201057-u425eni465q4idwn-3
bzrlib/dirstate.py dirstate.py-20060728012006-d6mvoihjb3je9peu-1
bzrlib/tests/test__dirstate_helpers.py test_dirstate_helper-20070504035751-jsbn00xodv0y1eve-2
bzrlib/tests/test_dirstate.py test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
------------------------------------------------------------
revno: 4159.1.2
revision-id: ian.clatworthy at canonical.com-20090318030204-rjrcdyzi51obr8ln
parent: ian.clatworthy at canonical.com-20090318020811-cpzi224lsuq7va6h
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: ianc-integration
timestamp: Wed 2009-03-18 13:02:04 +1000
message:
make sha1_provider a parameter to DirState() - tweak tests
modified:
bzrlib/tests/test__dirstate_helpers.py test_dirstate_helper-20070504035751-jsbn00xodv0y1eve-2
------------------------------------------------------------
revno: 4159.1.1
revision-id: ian.clatworthy at canonical.com-20090318020811-cpzi224lsuq7va6h
parent: pqm at pqm.ubuntu.com-20090318013435-d4g73c7rzqidhct1
parent: ian.clatworthy at canonical.com-20090318020322-75tdtdapmjbadbup
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: ianc-integration
timestamp: Wed 2009-03-18 12:08:11 +1000
message:
make sha1_provider a parameter to DirState()
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/_dirstate_helpers_c.pyx dirstate_helpers.pyx-20070503201057-u425eni465q4idwn-3
bzrlib/dirstate.py dirstate.py-20060728012006-d6mvoihjb3je9peu-1
bzrlib/tests/test__dirstate_helpers.py test_dirstate_helper-20070504035751-jsbn00xodv0y1eve-2
bzrlib/tests/test_dirstate.py test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
------------------------------------------------------------
revno: 4132.2.5
revision-id: ian.clatworthy at canonical.com-20090318020322-75tdtdapmjbadbup
parent: ian.clatworthy at canonical.com-20090313075745-lx9fki4aqrz08tv2
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: bzr.dirstate-sha1-provider
timestamp: Wed 2009-03-18 12:03:22 +1000
message:
feedback from poolie - use SHA, not Sha, in class names
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/dirstate.py dirstate.py-20060728012006-d6mvoihjb3je9peu-1
bzrlib/tests/test__dirstate_helpers.py test_dirstate_helper-20070504035751-jsbn00xodv0y1eve-2
bzrlib/tests/test_dirstate.py test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
------------------------------------------------------------
revno: 4132.2.4
revision-id: ian.clatworthy at canonical.com-20090313075745-lx9fki4aqrz08tv2
parent: ian.clatworthy at canonical.com-20090313062258-tmiy9u7oq2yhsvwg
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: bzr.dirstate-sha1-provider
timestamp: Fri 2009-03-13 17:57:45 +1000
message:
John's enhancements/fixes to the tests
modified:
bzrlib/_dirstate_helpers_c.pyx dirstate_helpers.pyx-20070503201057-u425eni465q4idwn-3
bzrlib/dirstate.py dirstate.py-20060728012006-d6mvoihjb3je9peu-1
bzrlib/tests/test__dirstate_helpers.py test_dirstate_helper-20070504035751-jsbn00xodv0y1eve-2
bzrlib/tests/test_dirstate.py test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
------------------------------------------------------------
revno: 4132.2.3
revision-id: ian.clatworthy at canonical.com-20090313062258-tmiy9u7oq2yhsvwg
parent: ian.clatworthy at canonical.com-20090313030309-brxt738eptrfqpc4
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: bzr.dirstate-sha1-provider
timestamp: Fri 2009-03-13 16:22:58 +1000
message:
add test as suggested by poolie's review
modified:
bzrlib/dirstate.py dirstate.py-20060728012006-d6mvoihjb3je9peu-1
bzrlib/tests/test_dirstate.py test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
------------------------------------------------------------
revno: 4132.2.2
revision-id: ian.clatworthy at canonical.com-20090313030309-brxt738eptrfqpc4
parent: ian.clatworthy at canonical.com-20090312132522-8wtyit4o5kx244yz
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: bzr.dirstate-sha1-provider
timestamp: Fri 2009-03-13 13:03:09 +1000
message:
make sha1_provider a mandatory param for DirState.__init__()
modified:
bzrlib/dirstate.py dirstate.py-20060728012006-d6mvoihjb3je9peu-1
bzrlib/tests/test_dirstate.py test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
------------------------------------------------------------
revno: 4132.2.1
revision-id: ian.clatworthy at canonical.com-20090312132522-8wtyit4o5kx244yz
parent: pqm at pqm.ubuntu.com-20090312112116-550sw6tk8syaaxku
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: bzr.dirstate-sha1-provider
timestamp: Thu 2009-03-12 23:25:22 +1000
message:
make sha1_provider a parameter to DirState()
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/dirstate.py dirstate.py-20060728012006-d6mvoihjb3je9peu-1
bzrlib/tests/test_dirstate.py test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
=== modified file 'NEWS'
--- a/NEWS 2009-03-18 02:14:31 +0000
+++ b/NEWS 2009-03-18 03:45:36 +0000
@@ -111,6 +111,10 @@
INTERNALS:
+ * ``DirState`` can now be passed a custom ``SHA1Provider`` object
+ enabling it to store the sha1 and stat of the canonical (post
+ content filtered) form. (Ian Clatworthy)
+
* New ``assertLength`` method based on one Martin has squirreled away
somewhere. (Robert Collins, Martin Pool)
=== modified file 'bzrlib/_dirstate_helpers_c.pyx'
--- a/bzrlib/_dirstate_helpers_c.pyx 2009-03-11 01:53:16 +0000
+++ b/bzrlib/_dirstate_helpers_c.pyx 2009-03-13 07:57:45 +0000
@@ -1144,14 +1144,9 @@
if target_details[2] == source_details[2]:
if link_or_sha1 is None:
# Stat cache miss:
- file_obj = file(path_info[4], 'rb')
- try:
- # XXX: TODO: Use lower level file IO rather
- # than python objects for sha-misses.
- statvalue = self.fstat(file_obj.fileno())
- link_or_sha1 = self.sha_file(file_obj)
- finally:
- file_obj.close()
+ statvalue, link_or_sha1 = \
+ self.state._sha1_provider.stat_and_sha1(
+ path_info[4])
self.state._observed_sha1(entry, link_or_sha1,
statvalue)
content_change = (link_or_sha1 != source_details[1])
=== modified file 'bzrlib/dirstate.py'
--- a/bzrlib/dirstate.py 2009-03-12 05:32:56 +0000
+++ b/bzrlib/dirstate.py 2009-03-18 02:08:11 +0000
@@ -82,8 +82,9 @@
'a' is an absent entry: In that tree the id is not present at this path.
'd' is a directory entry: This path in this tree is a directory with the
current file id. There is no fingerprint for directories.
-'f' is a file entry: As for directory, but its a file. The fingerprint is a
- sha1 value.
+'f' is a file entry: As for directory, but it's a file. The fingerprint is the
+ sha1 value of the file's canonical form, i.e. after any read filters have
+ been applied to the convenience form stored in the working tree.
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
link target.
't' is a reference to a nested subtree; the fingerprint is the referenced
@@ -262,6 +263,40 @@
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
+class SHA1Provider(object):
+ """An interface for getting sha1s of a file."""
+
+ def sha1(self, abspath):
+ """Return the sha1 of a file given its absolute path."""
+ raise NotImplementedError(self.sha1)
+
+ def stat_and_sha1(self, abspath):
+ """Return the stat and sha1 of a file given its absolute path.
+
+ Note: the stat should be the stat of the physical file
+ while the sha may be the sha of its canonical content.
+ """
+ raise NotImplementedError(self.stat_and_sha1)
+
+
+class DefaultSHA1Provider(SHA1Provider):
+ """A SHA1Provider that reads directly from the filesystem."""
+
+ def sha1(self, abspath):
+ """Return the sha1 of a file given its absolute path."""
+ return osutils.sha_file_by_name(abspath)
+
+ def stat_and_sha1(self, abspath):
+ """Return the stat and sha1 of a file given its absolute path."""
+ file_obj = file(abspath, 'rb')
+ try:
+ statvalue = os.fstat(file_obj.fileno())
+ sha1 = osutils.sha_file(file_obj)
+ finally:
+ file_obj.close()
+ return statvalue, sha1
+
+
class DirState(object):
"""Record directory and metadata state for fast access.
@@ -320,10 +355,11 @@
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
- def __init__(self, path):
+ def __init__(self, path, sha1_provider):
"""Create a DirState object.
:param path: The path at which the dirstate file on disk should live.
+ :param sha1_provider: an object meeting the SHA1Provider interface.
"""
# _header_state and _dirblock_state represent the current state
# of the dirstate metadata and the per-row data respectiely.
@@ -355,10 +391,11 @@
self._cutoff_time = None
self._split_path_cache = {}
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
+ self._sha1_provider = sha1_provider
if 'hashcache' in debug.debug_flags:
self._sha1_file = self._sha1_file_and_mutter
else:
- self._sha1_file = osutils.sha_file_by_name
+ self._sha1_file = self._sha1_provider.sha1
# These two attributes provide a simple cache for lookups into the
# dirstate in-memory vectors. By probing respectively for the last
# block, and for the next entry, we save nearly 2 bisections per path
@@ -380,7 +417,8 @@
:param kind: The kind of the path, as a string like 'file',
'directory', etc.
:param stat: The output of os.lstat for the path.
- :param fingerprint: The sha value of the file,
+ :param fingerprint: The sha value of the file's canonical form (i.e.
+ after any read filters have been applied),
or the target of a symlink,
or the referenced revision id for tree-references,
or '' for directories.
@@ -1194,15 +1232,18 @@
return entry_index, present
@staticmethod
- def from_tree(tree, dir_state_filename):
+ def from_tree(tree, dir_state_filename, sha1_provider=None):
"""Create a dirstate from a bzr Tree.
:param tree: The tree which should provide parent information and
inventory ids.
+ :param sha1_provider: an object meeting the SHA1Provider interface.
+ If None, a DefaultSHA1Provider is used.
:return: a DirState object which is currently locked for writing.
(it was locked by DirState.initialize)
"""
- result = DirState.initialize(dir_state_filename)
+ result = DirState.initialize(dir_state_filename,
+ sha1_provider=sha1_provider)
try:
tree.lock_read()
try:
@@ -1569,7 +1610,7 @@
# when -Dhashcache is turned on, this is monkey-patched in to log
# file reads
trace.mutter("dirstate sha1 " + abspath)
- return osutils.sha_file_by_name(abspath)
+ return self._sha1_provider.sha1(abspath)
def _is_executable(self, mode, old_executable):
"""Is this file executable?"""
@@ -1829,13 +1870,15 @@
return None, None
@classmethod
- def initialize(cls, path):
+ def initialize(cls, path, sha1_provider=None):
"""Create a new dirstate on path.
The new dirstate will be an empty tree - that is it has no parents,
and only a root node - which has id ROOT_ID.
:param path: The name of the file for the dirstate.
+ :param sha1_provider: an object meeting the SHA1Provider interface.
+ If None, a DefaultSHA1Provider is used.
:return: A write-locked DirState object.
"""
# This constructs a new DirState object on a path, sets the _state_file
@@ -1843,7 +1886,9 @@
# stock empty dirstate information - a root with ROOT_ID, no children,
# and no parents. Finally it calls save() to ensure that this data will
# persist.
- result = cls(path)
+ if sha1_provider is None:
+ sha1_provider = DefaultSHA1Provider()
+ result = cls(path, sha1_provider)
# root dir and root dir contents with no children.
empty_tree_dirblocks = [('', []), ('', [])]
# a new root directory, with a NULLSTAT.
@@ -1982,12 +2027,17 @@
return len(self._parents) - len(self._ghosts)
@staticmethod
- def on_file(path):
+ def on_file(path, sha1_provider=None):
"""Construct a DirState on the file at path path.
+ :param path: The path at which the dirstate file on disk should live.
+ :param sha1_provider: an object meeting the SHA1Provider interface.
+ If None, a DefaultSHA1Provider is used.
:return: An unlocked DirState object, associated with the given path.
"""
- result = DirState(path)
+ if sha1_provider is None:
+ sha1_provider = DefaultSHA1Provider()
+ result = DirState(path, sha1_provider)
return result
def _read_dirblocks_if_needed(self):
@@ -2482,8 +2532,8 @@
:param minikind: The type for the entry ('f' == 'file', 'd' ==
'directory'), etc.
:param executable: Should the executable bit be set?
- :param fingerprint: Simple fingerprint for new entry: sha1 for files,
- referenced revision id for subtrees, etc.
+ :param fingerprint: Simple fingerprint for new entry: canonical-form
+ sha1 for files, referenced revision id for subtrees, etc.
:param packed_stat: Packed stat value for new entry.
:param size: Size information for new entry
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
@@ -2845,9 +2895,12 @@
and stat_value.st_ctime < state._cutoff_time
and len(entry[1]) > 1
and entry[1][1][0] != 'a'):
- # Could check for size changes for further optimised
- # avoidance of sha1's. However the most prominent case of
- # over-shaing is during initial add, which this catches.
+ # Could check for size changes for further optimised
+ # avoidance of sha1's. However the most prominent case of
+ # over-shaing is during initial add, which this catches.
+ # Besides, if content filtering happens, size and sha
+ # are calculated at the same time, so checking just the size
+ # gains nothing w.r.t. performance.
link_or_sha1 = state._sha1_file(abspath)
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
executable, packed_stat)
@@ -3006,12 +3059,9 @@
if target_details[2] == source_details[2]:
if link_or_sha1 is None:
# Stat cache miss:
- file_obj = file(path_info[4], 'rb')
- try:
- statvalue = os.fstat(file_obj.fileno())
- link_or_sha1 = osutils.sha_file(file_obj)
- finally:
- file_obj.close()
+ statvalue, link_or_sha1 = \
+ self.state._sha1_provider.stat_and_sha1(
+ path_info[4])
self.state._observed_sha1(entry, link_or_sha1,
statvalue)
content_change = (link_or_sha1 != source_details[1])
=== modified file 'bzrlib/tests/test__dirstate_helpers.py'
--- a/bzrlib/tests/test__dirstate_helpers.py 2009-03-11 03:07:31 +0000
+++ b/bzrlib/tests/test__dirstate_helpers.py 2009-03-18 03:02:04 +0000
@@ -23,6 +23,7 @@
from bzrlib import (
dirstate,
errors,
+ osutils,
tests,
)
from bzrlib.tests import (
@@ -1173,6 +1174,59 @@
self.assertEqual([('f', '', 14, True, dirstate.DirState.NULLSTAT)],
entry[1])
+ def _prepare_tree(self):
+ # Create a tree
+ text = 'Hello World\n'
+ tree = self.make_branch_and_tree('tree')
+ self.build_tree_contents([('tree/a file', text)])
+ tree.add('a file', 'a-file-id')
+ # Note: dirstate does not sha prior to the first commit
+ # so commit now in order for the test to work
+ tree.commit('first')
+ return tree, text
+
+ def test_sha1provider_sha1_used(self):
+ tree, text = self._prepare_tree()
+ state = dirstate.DirState.from_tree(tree, 'dirstate',
+ UppercaseSHA1Provider())
+ self.addCleanup(state.unlock)
+ expected_sha = osutils.sha_string(text.upper() + "foo")
+ entry = state._get_entry(0, path_utf8='a file')
+ state._sha_cutoff_time()
+ state._cutoff_time += 10
+ sha1 = dirstate.update_entry(state, entry, 'tree/a file',
+ os.lstat('tree/a file'))
+ self.assertEqual(expected_sha, sha1)
+
+ def test_sha1provider_stat_and_sha1_used(self):
+ tree, text = self._prepare_tree()
+ tree.lock_write()
+ self.addCleanup(tree.unlock)
+ state = tree._current_dirstate()
+ state._sha1_provider = UppercaseSHA1Provider()
+ # If we used the standard provider, it would look like nothing has
+ # changed
+ file_ids_changed = [change[0] for change
+ in tree.iter_changes(tree.basis_tree())]
+ self.assertEqual(['a-file-id'], file_ids_changed)
+
+
+class UppercaseSHA1Provider(dirstate.SHA1Provider):
+ """A custom SHA1Provider."""
+
+ def sha1(self, abspath):
+ return self.stat_and_sha1(abspath)[1]
+
+ def stat_and_sha1(self, abspath):
+ file_obj = file(abspath, 'rb')
+ try:
+ statvalue = os.fstat(file_obj.fileno())
+ text = ''.join(file_obj.readlines())
+ sha1 = osutils.sha_string(text.upper() + "foo")
+ finally:
+ file_obj.close()
+ return statvalue, sha1
+
class TestCompiledUpdateEntry(TestUpdateEntry):
"""Test the pyrex implementation of _read_dirblocks"""
@@ -1182,3 +1236,59 @@
def set_update_entry(self):
from bzrlib._dirstate_helpers_c import update_entry
self.update_entry = update_entry
+
+
+class TestProcessEntryPython(test_dirstate.TestCaseWithDirState):
+
+ def setUp(self):
+ super(TestProcessEntryPython, self).setUp()
+ self.setup_process_entry()
+
+ def setup_process_entry(self):
+ from bzrlib import dirstate
+ orig = dirstate._process_entry
+ def cleanup():
+ dirstate._process_entry = orig
+ self.addCleanup(cleanup)
+ dirstate._process_entry = dirstate.ProcessEntryPython
+
+ def assertChangedFileIds(self, expected, tree):
+ tree.lock_read()
+ try:
+ file_ids = [info[0] for info
+ in tree.iter_changes(tree.basis_tree())]
+ finally:
+ tree.unlock()
+ self.assertEqual(sorted(expected), sorted(file_ids))
+
+ def test_simple_changes(self):
+ tree = self.make_branch_and_tree('tree')
+ self.build_tree(['tree/file'])
+ tree.add(['file'], ['file-id'])
+ self.assertChangedFileIds([tree.get_root_id(), 'file-id'], tree)
+ tree.commit('one')
+ self.assertChangedFileIds([], tree)
+
+ def test_sha1provider_stat_and_sha1_used(self):
+ tree = self.make_branch_and_tree('tree')
+ self.build_tree(['tree/file'])
+ tree.add(['file'], ['file-id'])
+ tree.commit('one')
+ tree.lock_write()
+ self.addCleanup(tree.unlock)
+ state = tree._current_dirstate()
+ state._sha1_provider = UppercaseSHA1Provider()
+ self.assertChangedFileIds(['file-id'], tree)
+
+
+class TestProcessEntryC(TestProcessEntryPython):
+ _test_needs_features = [CompiledDirstateHelpersFeature]
+
+ def setup_process_entry(self):
+ from bzrlib import _dirstate_helpers_c
+ orig = dirstate._process_entry
+ def cleanup():
+ dirstate._process_entry = orig
+ self.addCleanup(cleanup)
+ dirstate._process_entry = _dirstate_helpers_c.ProcessEntryC
+
=== modified file 'bzrlib/tests/test_dirstate.py'
--- a/bzrlib/tests/test_dirstate.py 2009-01-17 01:30:58 +0000
+++ b/bzrlib/tests/test_dirstate.py 2009-03-18 02:03:22 +0000
@@ -30,6 +30,7 @@
from bzrlib.tests import (
SymlinkFeature,
TestCase,
+ TestCaseInTempDir,
TestCaseWithTransport,
)
@@ -1616,11 +1617,12 @@
class InstrumentedDirState(dirstate.DirState):
"""An DirState with instrumented sha1 functionality."""
- def __init__(self, path):
- super(InstrumentedDirState, self).__init__(path)
+ def __init__(self, path, sha1_provider):
+ super(InstrumentedDirState, self).__init__(path, sha1_provider)
self._time_offset = 0
self._log = []
# member is dynamically set in DirState.__init__ to turn on trace
+ self._sha1_provider = sha1_provider
self._sha1_file = self._sha1_file_and_log
def _sha_cutoff_time(self):
@@ -1629,7 +1631,7 @@
def _sha1_file_and_log(self, abspath):
self._log.append(('sha1', abspath))
- return osutils.sha_file_by_name(abspath)
+ return self._sha1_provider.sha1(abspath)
def _read_link(self, abspath, old_link):
self._log.append(('read_link', abspath, old_link))
@@ -1666,6 +1668,11 @@
self.st_ino = ino
self.st_mode = mode
+ @staticmethod
+ def from_stat(st):
+ return _FakeStat(st.st_size, st.st_mtime, st.st_ctime, st.st_dev,
+ st.st_ino, st.st_mode)
+
class TestPackStat(TestCaseWithTransport):
@@ -2221,3 +2228,28 @@
inv_entry.symlink_target = u'link-target'
details = self.assertDetails(('l', 'link-target', 0, False,
'link-revision-id'), inv_entry)
+
+
+class TestSHA1Provider(TestCaseInTempDir):
+
+ def test_sha1provider_is_an_interface(self):
+ p = dirstate.SHA1Provider()
+ self.assertRaises(NotImplementedError, p.sha1, "foo")
+ self.assertRaises(NotImplementedError, p.stat_and_sha1, "foo")
+
+ def test_defaultsha1provider_sha1(self):
+ text = 'test\r\nwith\nall\rpossible line endings\r\n'
+ self.build_tree_contents([('foo', text)])
+ expected_sha = osutils.sha_string(text)
+ p = dirstate.DefaultSHA1Provider()
+ self.assertEqual(expected_sha, p.sha1('foo'))
+
+ def test_defaultsha1provider_stat_and_sha1(self):
+ text = 'test\r\nwith\nall\rpossible line endings\r\n'
+ self.build_tree_contents([('foo', text)])
+ expected_sha = osutils.sha_string(text)
+ p = dirstate.DefaultSHA1Provider()
+ statvalue, sha1 = p.stat_and_sha1('foo')
+ self.assertTrue(len(statvalue) >= 10)
+ self.assertEqual(len(text), statvalue.st_size)
+ self.assertEqual(expected_sha, sha1)
More information about the bazaar-commits
mailing list