Rev 2735: Implement and use (Tree|Repo).iter_file_bytes in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Tue Aug 21 03:18:44 BST 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 2735
revision-id: pqm at pqm.ubuntu.com-20070821021841-atd5egvj6vx1gdxr
parent: pqm at pqm.ubuntu.com-20070821013501-op6nu6ae9u5gp56v
parent: aaron.bentley at utoronto.ca-20070821014039-pvms1pv6sq5m9as0
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-08-21 03:18:41 +0100
message:
Implement and use (Tree|Repo).iter_file_bytes
modified:
bzrlib/errors.py errors.py-20050309040759-20512168c4e14fbd
bzrlib/remote.py remote.py-20060720103555-yeeg2x51vn0rbtdp-1
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/revisiontree.py revisiontree.py-20060724012533-bg8xyryhxd0o0i0h-1
bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
bzrlib/tests/tree_implementations/test_tree.py test_tree.py-20061215160206-usu7lwcj8aq2n3br-1
bzrlib/transform.py transform.py-20060105172343-dd99e54394d91687
bzrlib/tree.py tree.py-20050309040759-9d5f2496be663e77
bzrlib/workingtree_4.py workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
------------------------------------------------------------
revno: 2708.1.12
merged: aaron.bentley at utoronto.ca-20070821014039-pvms1pv6sq5m9as0
parent: abentley at panoramicfeedback.com-20070820142740-hox5gfm6nifrkq01
parent: pqm at pqm.ubuntu.com-20070821013501-op6nu6ae9u5gp56v
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Mon 2007-08-20 21:40:39 -0400
message:
Merge from bzr.dev
------------------------------------------------------------
revno: 2708.1.11
merged: abentley at panoramicfeedback.com-20070820142740-hox5gfm6nifrkq01
parent: abentley at panoramicfeedback.com-20070820134709-66tbctep1hwz08kj
committer: Aaron Bentley <abentley at panoramicfeedback.com>
branch nick: extract-files
timestamp: Mon 2007-08-20 10:27:40 -0400
message:
Test and tweak error handling
------------------------------------------------------------
revno: 2708.1.10
merged: abentley at panoramicfeedback.com-20070820134709-66tbctep1hwz08kj
parent: aaron.bentley at utoronto.ca-20070816115141-svn9tgp2olq29p3h
committer: Aaron Bentley <abentley at panoramicfeedback.com>
branch nick: extract-files
timestamp: Mon 2007-08-20 09:47:09 -0400
message:
Update docstrings
------------------------------------------------------------
revno: 2708.1.9
merged: aaron.bentley at utoronto.ca-20070816115141-svn9tgp2olq29p3h
parent: aaron.bentley at utoronto.ca-20070816053708-3zot9t5j8rvgpho3
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Thu 2007-08-16 07:51:41 -0400
message:
Clean-up docs and imports
------------------------------------------------------------
revno: 2708.1.8
merged: aaron.bentley at utoronto.ca-20070816053708-3zot9t5j8rvgpho3
parent: aaron.bentley at utoronto.ca-20070816044100-sdff9czzjft6g919
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Thu 2007-08-16 01:37:08 -0400
message:
rename extract_files_bytest to iter_files_bytes, fix build_tree / progress
------------------------------------------------------------
revno: 2708.1.7
merged: aaron.bentley at utoronto.ca-20070816044100-sdff9czzjft6g919
parent: aaron.bentley at utoronto.ca-20070816042852-j3s9b1jw94b15mo3
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Thu 2007-08-16 00:41:00 -0400
message:
Rename extract_files_bytes to iter_files_bytes
------------------------------------------------------------
revno: 2708.1.6
merged: aaron.bentley at utoronto.ca-20070816042852-j3s9b1jw94b15mo3
parent: aaron.bentley at utoronto.ca-20070816021405-ecx25ccgkmq5st3y
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Thu 2007-08-16 00:28:52 -0400
message:
Turn extract_files_bytes into an iterator
------------------------------------------------------------
revno: 2708.1.5
merged: aaron.bentley at utoronto.ca-20070816021405-ecx25ccgkmq5st3y
parent: aaron.bentley at utoronto.ca-20070816014818-6o3gtt3oegsmt1zw
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Wed 2007-08-15 22:14:05 -0400
message:
Use Tree.extract_files_bytes in revert
------------------------------------------------------------
revno: 2708.1.4
merged: aaron.bentley at utoronto.ca-20070816014818-6o3gtt3oegsmt1zw
parent: aaron.bentley at utoronto.ca-20070816013943-hrsavkonrn3oqf92
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Wed 2007-08-15 21:48:18 -0400
message:
RevisionTree and DirStateRevisionTree use Repository.extract_files_bytes
------------------------------------------------------------
revno: 2708.1.3
merged: aaron.bentley at utoronto.ca-20070816013943-hrsavkonrn3oqf92
parent: aaron.bentley at utoronto.ca-20070816003515-n0u2ajn01mgljhc1
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Wed 2007-08-15 21:39:43 -0400
message:
Implement extract_files_bytes on Repository
------------------------------------------------------------
revno: 2708.1.2
merged: aaron.bentley at utoronto.ca-20070816003515-n0u2ajn01mgljhc1
parent: aaron.bentley at utoronto.ca-20070816001426-8aqbepjh4b3qu8o4
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Wed 2007-08-15 20:35:15 -0400
message:
Use extract_files_bytes for build_tree
------------------------------------------------------------
revno: 2708.1.1
merged: aaron.bentley at utoronto.ca-20070816001426-8aqbepjh4b3qu8o4
parent: pqm at pqm.ubuntu.com-20070815225233-w4gpchswmwvqi12r
committer: Aaron Bentley <aaron.bentley at utoronto.ca>
branch nick: extract-files
timestamp: Wed 2007-08-15 20:14:26 -0400
message:
Implement Tree.extract_files
=== modified file 'bzrlib/errors.py'
--- a/bzrlib/errors.py 2007-08-14 11:17:54 +0000
+++ b/bzrlib/errors.py 2007-08-21 01:40:39 +0000
@@ -264,6 +264,15 @@
self.tree = tree
+class NoSuchIdInRepository(NoSuchId):
+
+ _fmt = ("The file id %(file_id)r is not present in the repository"
+ " %(repository)r")
+
+ def __init__(self, repository, file_id):
+ BzrError.__init__(self, repository=repository, file_id=file_id)
+
+
class InventoryModified(BzrError):
_fmt = ("The current inventory for the tree %(tree)r has been modified,"
=== modified file 'bzrlib/remote.py'
--- a/bzrlib/remote.py 2007-08-17 05:16:14 +0000
+++ b/bzrlib/remote.py 2007-08-21 01:40:39 +0000
@@ -611,6 +611,12 @@
self._ensure_real()
return self._real_repository.fileids_altered_by_revision_ids(revision_ids)
+ def iter_files_bytes(self, desired_files):
+ """See Repository.iter_file_bytes.
+ """
+ self._ensure_real()
+ return self._real_repository.iter_files_bytes(desired_files)
+
@needs_read_lock
def get_signature_text(self, revision_id):
self._ensure_real()
=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py 2007-08-16 05:50:11 +0000
+++ b/bzrlib/repository.py 2007-08-21 01:40:39 +0000
@@ -731,6 +731,33 @@
pb.finished()
return result
+ def iter_files_bytes(self, desired_files):
+ """Iterate through file versions.
+
+ Files will not necessarily be returned in the order they occur in
+ desired_files. No specific order is guaranteed.
+
+ Yields pairs of identifier, bytes_iterator. identifier is an opaque
+ value supplied by the caller as part of desired_files. It should
+ uniquely identify the file version in the caller's context. (Examples:
+ an index number or a TreeTransform trans_id.)
+
+ bytes_iterator is an iterable of bytestrings for the file. The
+ kind of iterable and length of the bytestrings are unspecified, but for
+ this implementation, it is a list of lines produced by
+ VersionedFile.get_lines().
+
+ :param desired_files: a list of (file_id, revision_id, identifier)
+ triples
+ """
+ transaction = self.get_transaction()
+ for file_id, revision_id, callable_data in desired_files:
+ try:
+ weave = self.weave_store.get_weave(file_id, transaction)
+ except errors.NoSuchFile:
+ raise errors.NoSuchIdInRepository(self, file_id)
+ yield callable_data, weave.get_lines(revision_id)
+
def item_keys_introduced_by(self, revision_ids, _files_pb=None):
"""Get an iterable listing the keys of all the data introduced by a set
of revision IDs.
=== modified file 'bzrlib/revisiontree.py'
--- a/bzrlib/revisiontree.py 2007-08-09 03:23:04 +0000
+++ b/bzrlib/revisiontree.py 2007-08-16 04:41:00 +0000
@@ -85,6 +85,14 @@
file_id = osutils.safe_file_id(file_id)
return StringIO(self.get_file_text(file_id))
+ def iter_files_bytes(self, desired_files):
+ """See Tree.extract_files_bytes.
+
+ This version is implemented on top of Repository.extract_files_bytes"""
+ repo_desired_files = [(f, self.inventory[f].revision, i)
+ for f, i in desired_files]
+ return self._repository.iter_files_bytes(repo_desired_files)
+
def annotate_iter(self, file_id,
default_revision=revision.CURRENT_REVISION):
"""See Tree.annotate_iter"""
=== modified file 'bzrlib/tests/repository_implementations/test_repository.py'
--- a/bzrlib/tests/repository_implementations/test_repository.py 2007-08-17 05:16:14 +0000
+++ b/bzrlib/tests/repository_implementations/test_repository.py 2007-08-21 01:40:39 +0000
@@ -375,6 +375,31 @@
format = repo.get_serializer_format()
self.assertEqual(repo._serializer.format_num, format)
+ def test_iter_files_bytes(self):
+ tree = self.make_branch_and_tree('tree')
+ self.build_tree_contents([('tree/file1', 'foo'),
+ ('tree/file2', 'bar')])
+ tree.add(['file1', 'file2'], ['file1-id', 'file2-id'])
+ tree.commit('rev1', rev_id='rev1')
+ self.build_tree_contents([('tree/file1', 'baz')])
+ tree.commit('rev2', rev_id='rev2')
+ repository = tree.branch.repository
+ extracted = dict((i, ''.join(b)) for i, b in
+ repository.iter_files_bytes(
+ [('file1-id', 'rev1', 'file1-old'),
+ ('file1-id', 'rev2', 'file1-new'),
+ ('file2-id', 'rev1', 'file2'),
+ ]))
+ self.assertEqual('foo', extracted['file1-old'])
+ self.assertEqual('bar', extracted['file2'])
+ self.assertEqual('baz', extracted['file1-new'])
+ self.assertRaises(errors.RevisionNotPresent, list,
+ repository.iter_files_bytes(
+ [('file1-id', 'rev3', 'file1-notpresent')]))
+ self.assertRaises(errors.NoSuchId, list,
+ repository.iter_files_bytes(
+ [('file3-id', 'rev3', 'file1-notpresent')]))
+
class TestRepositoryLocking(TestCaseWithRepository):
=== modified file 'bzrlib/tests/tree_implementations/test_tree.py'
--- a/bzrlib/tests/tree_implementations/test_tree.py 2007-07-19 15:44:17 +0000
+++ b/bzrlib/tests/tree_implementations/test_tree.py 2007-08-20 14:27:40 +0000
@@ -119,3 +119,26 @@
self.assertRaises(errors.NoSuchId, tree.id2path, 'a')
finally:
tree.unlock()
+
+
+class TestExtractFilesBytes(TestCaseWithTree):
+
+ def test_iter_files_bytes(self):
+ work_tree = self.make_branch_and_tree('wt')
+ self.build_tree_contents([('wt/foo', 'foo'),
+ ('wt/bar', 'bar'),
+ ('wt/baz', 'baz')])
+ work_tree.add(['foo', 'bar', 'baz'], ['foo-id', 'bar-id', 'baz-id'])
+ tree = self._convert_tree(work_tree)
+ tree.lock_read()
+ self.addCleanup(tree.unlock)
+ extracted = dict((i, ''.join(b)) for i, b in
+ tree.iter_files_bytes([('foo-id', 'id1'),
+ ('bar-id', 'id2'),
+ ('baz-id', 'id3')]))
+ self.assertEqual('foo', extracted['id1'])
+ self.assertEqual('bar', extracted['id2'])
+ self.assertEqual('baz', extracted['id3'])
+ self.assertRaises(errors.NoSuchId, lambda: list(
+ tree.iter_files_bytes(
+ [('qux-id', 'file1-notpresent')])))
=== modified file 'bzrlib/transform.py'
--- a/bzrlib/transform.py 2007-08-09 03:23:04 +0000
+++ b/bzrlib/transform.py 2007-08-16 05:37:08 +0000
@@ -1269,9 +1269,11 @@
tt.trans_id_tree_file_id(wt.get_root_id())
pb = bzrlib.ui.ui_factory.nested_progress_bar()
try:
+ deferred_contents = []
for num, (tree_path, entry) in \
enumerate(tree.inventory.iter_entries_by_dir()):
- pb.update("Building tree", num, len(tree.inventory))
+ pb.update("Building tree", num - len(deferred_contents),
+ len(tree.inventory))
if entry.parent_id is None:
continue
reparent = False
@@ -1300,12 +1302,29 @@
'entry %s parent id %r is not in file_trans_id %r'
% (entry, entry.parent_id, file_trans_id))
parent_id = file_trans_id[entry.parent_id]
- file_trans_id[file_id] = new_by_entry(tt, entry, parent_id,
- tree)
+ if entry.kind == 'file':
+ # We *almost* replicate new_by_entry, so that we can defer
+ # getting the file text, and get them all at once.
+ trans_id = tt.create_path(entry.name, parent_id)
+ file_trans_id[file_id] = trans_id
+ tt.version_file(entry.file_id, trans_id)
+ executable = tree.is_executable(entry.file_id, tree_path)
+ if executable is not None:
+ tt.set_executability(executable, trans_id)
+ deferred_contents.append((entry.file_id, trans_id))
+ else:
+ file_trans_id[file_id] = new_by_entry(tt, entry, parent_id,
+ tree)
if reparent:
new_trans_id = file_trans_id[file_id]
old_parent = tt.trans_id_tree_path(tree_path)
_reparent_children(tt, old_parent, new_trans_id)
+ for num, (trans_id, bytes) in enumerate(
+ tree.iter_files_bytes(deferred_contents)):
+ tt.create_file(bytes, trans_id)
+ pb.update('Adding file contents',
+ (num + len(tree.inventory) - len(deferred_contents)),
+ len(tree.inventory))
finally:
pb.finished()
pp.next_phase()
@@ -1563,6 +1582,7 @@
skip_root = False
basis_tree = None
try:
+ deferred_files = []
for id_num, (file_id, path, changed_content, versioned, parent, name,
kind, executable) in enumerate(change_list):
if skip_root and file_id[0] is not None and parent[0] is None:
@@ -1608,8 +1628,7 @@
tt.create_symlink(target_tree.get_symlink_target(file_id),
trans_id)
elif kind[1] == 'file':
- tt.create_file(target_tree.get_file_lines(file_id),
- trans_id, mode_id)
+ deferred_files.append((file_id, (trans_id, mode_id)))
if basis_tree is None:
basis_tree = working_tree.basis_tree()
basis_tree.lock_read()
@@ -1636,6 +1655,9 @@
name[1], tt.trans_id_file_id(parent[1]), trans_id)
if executable[0] != executable[1] and kind[1] == "file":
tt.set_executability(executable[1], trans_id)
+ for (trans_id, mode_id), bytes in target_tree.iter_files_bytes(
+ deferred_files):
+ tt.create_file(bytes, trans_id, mode_id)
finally:
if basis_tree is not None:
basis_tree.unlock()
=== modified file 'bzrlib/tree.py'
--- a/bzrlib/tree.py 2007-07-28 22:45:28 +0000
+++ b/bzrlib/tree.py 2007-08-20 13:47:09 +0000
@@ -225,6 +225,32 @@
def get_file_by_path(self, path):
return self.get_file(self._inventory.path2id(path))
+ def iter_files_bytes(self, desired_files):
+ """Iterate through file contents.
+
+ Files will not necessarily be returned in the order they occur in
+ desired_files. No specific order is guaranteed.
+
+ Yields pairs of identifier, bytes_iterator. identifier is an opaque
+ value supplied by the caller as part of desired_files. It should
+ uniquely identify the file version in the caller's context. (Examples:
+ an index number or a TreeTransform trans_id.)
+
+ bytes_iterator is an iterable of bytestrings for the file. The
+ kind of iterable and length of the bytestrings are unspecified, but for
+ this implementation, it is a tuple containing a single bytestring with
+ the complete text of the file.
+
+ :param desired_files: a list of (file_id, identifier) pairs
+ """
+ for file_id, identifier in desired_files:
+ # We wrap the string in a tuple so that we can return an iterable
+ # of bytestrings. (Technically, a bytestring is also an iterable
+ # of bytestrings, but iterating through each character is not
+ # performant.)
+ cur_file = (self.get_file_text(file_id),)
+ yield identifier, cur_file
+
def get_symlink_target(self, file_id):
"""Get the target for a given file_id.
=== modified file 'bzrlib/workingtree_4.py'
--- a/bzrlib/workingtree_4.py 2007-08-15 00:27:34 +0000
+++ b/bzrlib/workingtree_4.py 2007-08-21 01:40:39 +0000
@@ -1521,6 +1521,20 @@
def get_reference_revision(self, file_id, path=None):
return self.inventory[file_id].reference_revision
+ def iter_files_bytes(self, desired_files):
+ """See Tree.iter_files_bytes.
+
+ This version is implemented on top of Repository.iter_files_bytes"""
+ parent_index = self._get_parent_index()
+ repo_desired_files = []
+ for file_id, identifier in desired_files:
+ entry = self._get_entry(file_id)
+ if entry == (None, None):
+ raise errors.NoSuchId(self, file_id)
+ repo_desired_files.append((file_id, entry[1][parent_index][4],
+ identifier))
+ return self._repository.iter_files_bytes(repo_desired_files)
+
def get_symlink_target(self, file_id):
entry = self._get_entry(file_id=file_id)
parent_index = self._get_parent_index()
More information about the bazaar-commits
mailing list