Rev 2735: Implement and use (Tree|Repo).iter_file_bytes in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Aug 21 03:18:44 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2735
revision-id: pqm at pqm.ubuntu.com-20070821021841-atd5egvj6vx1gdxr
parent: pqm at pqm.ubuntu.com-20070821013501-op6nu6ae9u5gp56v
parent: aaron.bentley at utoronto.ca-20070821014039-pvms1pv6sq5m9as0
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-08-21 03:18:41 +0100
message:
  Implement and use (Tree|Repo).iter_file_bytes
modified:
  bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
  bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
  bzrlib/revisiontree.py         revisiontree.py-20060724012533-bg8xyryhxd0o0i0h-1
  bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
  bzrlib/tests/tree_implementations/test_tree.py test_tree.py-20061215160206-usu7lwcj8aq2n3br-1
  bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
  bzrlib/tree.py                 tree.py-20050309040759-9d5f2496be663e77
  bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
    ------------------------------------------------------------
    revno: 2708.1.12
    merged: aaron.bentley at utoronto.ca-20070821014039-pvms1pv6sq5m9as0
    parent: abentley at panoramicfeedback.com-20070820142740-hox5gfm6nifrkq01
    parent: pqm at pqm.ubuntu.com-20070821013501-op6nu6ae9u5gp56v
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Mon 2007-08-20 21:40:39 -0400
    message:
      Merge from bzr.dev
    ------------------------------------------------------------
    revno: 2708.1.11
    merged: abentley at panoramicfeedback.com-20070820142740-hox5gfm6nifrkq01
    parent: abentley at panoramicfeedback.com-20070820134709-66tbctep1hwz08kj
    committer: Aaron Bentley <abentley at panoramicfeedback.com>
    branch nick: extract-files
    timestamp: Mon 2007-08-20 10:27:40 -0400
    message:
      Test and tweak error handling
    ------------------------------------------------------------
    revno: 2708.1.10
    merged: abentley at panoramicfeedback.com-20070820134709-66tbctep1hwz08kj
    parent: aaron.bentley at utoronto.ca-20070816115141-svn9tgp2olq29p3h
    committer: Aaron Bentley <abentley at panoramicfeedback.com>
    branch nick: extract-files
    timestamp: Mon 2007-08-20 09:47:09 -0400
    message:
      Update docstrings
    ------------------------------------------------------------
    revno: 2708.1.9
    merged: aaron.bentley at utoronto.ca-20070816115141-svn9tgp2olq29p3h
    parent: aaron.bentley at utoronto.ca-20070816053708-3zot9t5j8rvgpho3
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Thu 2007-08-16 07:51:41 -0400
    message:
      Clean-up docs and imports
    ------------------------------------------------------------
    revno: 2708.1.8
    merged: aaron.bentley at utoronto.ca-20070816053708-3zot9t5j8rvgpho3
    parent: aaron.bentley at utoronto.ca-20070816044100-sdff9czzjft6g919
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Thu 2007-08-16 01:37:08 -0400
    message:
      rename extract_files_bytest to iter_files_bytes, fix build_tree / progress
    ------------------------------------------------------------
    revno: 2708.1.7
    merged: aaron.bentley at utoronto.ca-20070816044100-sdff9czzjft6g919
    parent: aaron.bentley at utoronto.ca-20070816042852-j3s9b1jw94b15mo3
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Thu 2007-08-16 00:41:00 -0400
    message:
      Rename extract_files_bytes to iter_files_bytes
    ------------------------------------------------------------
    revno: 2708.1.6
    merged: aaron.bentley at utoronto.ca-20070816042852-j3s9b1jw94b15mo3
    parent: aaron.bentley at utoronto.ca-20070816021405-ecx25ccgkmq5st3y
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Thu 2007-08-16 00:28:52 -0400
    message:
      Turn extract_files_bytes into an iterator
    ------------------------------------------------------------
    revno: 2708.1.5
    merged: aaron.bentley at utoronto.ca-20070816021405-ecx25ccgkmq5st3y
    parent: aaron.bentley at utoronto.ca-20070816014818-6o3gtt3oegsmt1zw
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Wed 2007-08-15 22:14:05 -0400
    message:
      Use Tree.extract_files_bytes in revert
    ------------------------------------------------------------
    revno: 2708.1.4
    merged: aaron.bentley at utoronto.ca-20070816014818-6o3gtt3oegsmt1zw
    parent: aaron.bentley at utoronto.ca-20070816013943-hrsavkonrn3oqf92
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Wed 2007-08-15 21:48:18 -0400
    message:
      RevisionTree and DirStateRevisionTree use Repository.extract_files_bytes
    ------------------------------------------------------------
    revno: 2708.1.3
    merged: aaron.bentley at utoronto.ca-20070816013943-hrsavkonrn3oqf92
    parent: aaron.bentley at utoronto.ca-20070816003515-n0u2ajn01mgljhc1
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Wed 2007-08-15 21:39:43 -0400
    message:
      Implement extract_files_bytes on Repository
    ------------------------------------------------------------
    revno: 2708.1.2
    merged: aaron.bentley at utoronto.ca-20070816003515-n0u2ajn01mgljhc1
    parent: aaron.bentley at utoronto.ca-20070816001426-8aqbepjh4b3qu8o4
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Wed 2007-08-15 20:35:15 -0400
    message:
      Use extract_files_bytes for build_tree
    ------------------------------------------------------------
    revno: 2708.1.1
    merged: aaron.bentley at utoronto.ca-20070816001426-8aqbepjh4b3qu8o4
    parent: pqm at pqm.ubuntu.com-20070815225233-w4gpchswmwvqi12r
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: extract-files
    timestamp: Wed 2007-08-15 20:14:26 -0400
    message:
      Implement Tree.extract_files
=== modified file 'bzrlib/errors.py'
--- a/bzrlib/errors.py	2007-08-14 11:17:54 +0000
+++ b/bzrlib/errors.py	2007-08-21 01:40:39 +0000
@@ -264,6 +264,15 @@
         self.tree = tree
 
 
+class NoSuchIdInRepository(NoSuchId):
+
+    _fmt = ("The file id %(file_id)r is not present in the repository"
+            " %(repository)r")
+
+    def __init__(self, repository, file_id):
+        BzrError.__init__(self, repository=repository, file_id=file_id)
+
+
 class InventoryModified(BzrError):
 
     _fmt = ("The current inventory for the tree %(tree)r has been modified,"

=== modified file 'bzrlib/remote.py'
--- a/bzrlib/remote.py	2007-08-17 05:16:14 +0000
+++ b/bzrlib/remote.py	2007-08-21 01:40:39 +0000
@@ -611,6 +611,12 @@
         self._ensure_real()
         return self._real_repository.fileids_altered_by_revision_ids(revision_ids)
 
+    def iter_files_bytes(self, desired_files):
+        """See Repository.iter_file_bytes.
+        """
+        self._ensure_real()
+        return self._real_repository.iter_files_bytes(desired_files)
+
     @needs_read_lock
     def get_signature_text(self, revision_id):
         self._ensure_real()

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2007-08-16 05:50:11 +0000
+++ b/bzrlib/repository.py	2007-08-21 01:40:39 +0000
@@ -731,6 +731,33 @@
             pb.finished()
         return result
 
+    def iter_files_bytes(self, desired_files):
+        """Iterate through file versions.
+
+        Files will not necessarily be returned in the order they occur in
+        desired_files.  No specific order is guaranteed.
+
+        Yields pairs of identifier, bytes_iterator.  identifier is an opaque
+        value supplied by the caller as part of desired_files.  It should
+        uniquely identify the file version in the caller's context.  (Examples:
+        an index number or a TreeTransform trans_id.)
+
+        bytes_iterator is an iterable of bytestrings for the file.  The
+        kind of iterable and length of the bytestrings are unspecified, but for
+        this implementation, it is a list of lines produced by
+        VersionedFile.get_lines().
+
+        :param desired_files: a list of (file_id, revision_id, identifier)
+            triples
+        """
+        transaction = self.get_transaction()
+        for file_id, revision_id, callable_data in desired_files:
+            try:
+                weave = self.weave_store.get_weave(file_id, transaction)
+            except errors.NoSuchFile:
+                raise errors.NoSuchIdInRepository(self, file_id)
+            yield callable_data, weave.get_lines(revision_id)
+
     def item_keys_introduced_by(self, revision_ids, _files_pb=None):
         """Get an iterable listing the keys of all the data introduced by a set
         of revision IDs.

=== modified file 'bzrlib/revisiontree.py'
--- a/bzrlib/revisiontree.py	2007-08-09 03:23:04 +0000
+++ b/bzrlib/revisiontree.py	2007-08-16 04:41:00 +0000
@@ -85,6 +85,14 @@
         file_id = osutils.safe_file_id(file_id)
         return StringIO(self.get_file_text(file_id))
 
+    def iter_files_bytes(self, desired_files):
+        """See Tree.extract_files_bytes.
+
+        This version is implemented on top of Repository.extract_files_bytes"""
+        repo_desired_files = [(f, self.inventory[f].revision, i)
+                              for f, i in desired_files]
+        return self._repository.iter_files_bytes(repo_desired_files)
+
     def annotate_iter(self, file_id,
                       default_revision=revision.CURRENT_REVISION):
         """See Tree.annotate_iter"""

=== modified file 'bzrlib/tests/repository_implementations/test_repository.py'
--- a/bzrlib/tests/repository_implementations/test_repository.py	2007-08-17 05:16:14 +0000
+++ b/bzrlib/tests/repository_implementations/test_repository.py	2007-08-21 01:40:39 +0000
@@ -375,6 +375,31 @@
         format = repo.get_serializer_format()
         self.assertEqual(repo._serializer.format_num, format)
 
+    def test_iter_files_bytes(self):
+        tree = self.make_branch_and_tree('tree')
+        self.build_tree_contents([('tree/file1', 'foo'),
+                                  ('tree/file2', 'bar')])
+        tree.add(['file1', 'file2'], ['file1-id', 'file2-id'])
+        tree.commit('rev1', rev_id='rev1')
+        self.build_tree_contents([('tree/file1', 'baz')])
+        tree.commit('rev2', rev_id='rev2')
+        repository = tree.branch.repository
+        extracted = dict((i, ''.join(b)) for i, b in
+                         repository.iter_files_bytes(
+                         [('file1-id', 'rev1', 'file1-old'),
+                          ('file1-id', 'rev2', 'file1-new'),
+                          ('file2-id', 'rev1', 'file2'),
+                         ]))
+        self.assertEqual('foo', extracted['file1-old'])
+        self.assertEqual('bar', extracted['file2'])
+        self.assertEqual('baz', extracted['file1-new'])
+        self.assertRaises(errors.RevisionNotPresent, list,
+                          repository.iter_files_bytes(
+                          [('file1-id', 'rev3', 'file1-notpresent')]))
+        self.assertRaises(errors.NoSuchId, list,
+                          repository.iter_files_bytes(
+                          [('file3-id', 'rev3', 'file1-notpresent')]))
+
 
 class TestRepositoryLocking(TestCaseWithRepository):
 

=== modified file 'bzrlib/tests/tree_implementations/test_tree.py'
--- a/bzrlib/tests/tree_implementations/test_tree.py	2007-07-19 15:44:17 +0000
+++ b/bzrlib/tests/tree_implementations/test_tree.py	2007-08-20 14:27:40 +0000
@@ -119,3 +119,26 @@
             self.assertRaises(errors.NoSuchId, tree.id2path, 'a')
         finally:
             tree.unlock()
+
+
+class TestExtractFilesBytes(TestCaseWithTree):
+
+    def test_iter_files_bytes(self):
+        work_tree = self.make_branch_and_tree('wt')
+        self.build_tree_contents([('wt/foo', 'foo'),
+                                  ('wt/bar', 'bar'),
+                                  ('wt/baz', 'baz')])
+        work_tree.add(['foo', 'bar', 'baz'], ['foo-id', 'bar-id', 'baz-id'])
+        tree = self._convert_tree(work_tree)
+        tree.lock_read()
+        self.addCleanup(tree.unlock)
+        extracted = dict((i, ''.join(b)) for i, b in
+                         tree.iter_files_bytes([('foo-id', 'id1'),
+                                                ('bar-id', 'id2'),
+                                                ('baz-id', 'id3')]))
+        self.assertEqual('foo', extracted['id1'])
+        self.assertEqual('bar', extracted['id2'])
+        self.assertEqual('baz', extracted['id3'])
+        self.assertRaises(errors.NoSuchId, lambda: list(
+                          tree.iter_files_bytes(
+                          [('qux-id', 'file1-notpresent')])))

=== modified file 'bzrlib/transform.py'
--- a/bzrlib/transform.py	2007-08-09 03:23:04 +0000
+++ b/bzrlib/transform.py	2007-08-16 05:37:08 +0000
@@ -1269,9 +1269,11 @@
             tt.trans_id_tree_file_id(wt.get_root_id())
         pb = bzrlib.ui.ui_factory.nested_progress_bar()
         try:
+            deferred_contents = []
             for num, (tree_path, entry) in \
                 enumerate(tree.inventory.iter_entries_by_dir()):
-                pb.update("Building tree", num, len(tree.inventory))
+                pb.update("Building tree", num - len(deferred_contents),
+                          len(tree.inventory))
                 if entry.parent_id is None:
                     continue
                 reparent = False
@@ -1300,12 +1302,29 @@
                         'entry %s parent id %r is not in file_trans_id %r'
                         % (entry, entry.parent_id, file_trans_id))
                 parent_id = file_trans_id[entry.parent_id]
-                file_trans_id[file_id] = new_by_entry(tt, entry, parent_id,
-                                                      tree)
+                if entry.kind == 'file':
+                    # We *almost* replicate new_by_entry, so that we can defer
+                    # getting the file text, and get them all at once.
+                    trans_id = tt.create_path(entry.name, parent_id)
+                    file_trans_id[file_id] = trans_id
+                    tt.version_file(entry.file_id, trans_id)
+                    executable = tree.is_executable(entry.file_id, tree_path)
+                    if executable is not None:
+                        tt.set_executability(executable, trans_id)
+                    deferred_contents.append((entry.file_id, trans_id))
+                else:
+                    file_trans_id[file_id] = new_by_entry(tt, entry, parent_id,
+                                                          tree)
                 if reparent:
                     new_trans_id = file_trans_id[file_id]
                     old_parent = tt.trans_id_tree_path(tree_path)
                     _reparent_children(tt, old_parent, new_trans_id)
+            for num, (trans_id, bytes) in enumerate(
+                tree.iter_files_bytes(deferred_contents)):
+                tt.create_file(bytes, trans_id)
+                pb.update('Adding file contents',
+                          (num + len(tree.inventory) - len(deferred_contents)),
+                          len(tree.inventory))
         finally:
             pb.finished()
         pp.next_phase()
@@ -1563,6 +1582,7 @@
         skip_root = False
     basis_tree = None
     try:
+        deferred_files = []
         for id_num, (file_id, path, changed_content, versioned, parent, name,
                 kind, executable) in enumerate(change_list):
             if skip_root and file_id[0] is not None and parent[0] is None:
@@ -1608,8 +1628,7 @@
                     tt.create_symlink(target_tree.get_symlink_target(file_id),
                                       trans_id)
                 elif kind[1] == 'file':
-                    tt.create_file(target_tree.get_file_lines(file_id),
-                                   trans_id, mode_id)
+                    deferred_files.append((file_id, (trans_id, mode_id)))
                     if basis_tree is None:
                         basis_tree = working_tree.basis_tree()
                         basis_tree.lock_read()
@@ -1636,6 +1655,9 @@
                     name[1], tt.trans_id_file_id(parent[1]), trans_id)
             if executable[0] != executable[1] and kind[1] == "file":
                 tt.set_executability(executable[1], trans_id)
+        for (trans_id, mode_id), bytes in target_tree.iter_files_bytes(
+            deferred_files):
+            tt.create_file(bytes, trans_id, mode_id)
     finally:
         if basis_tree is not None:
             basis_tree.unlock()

=== modified file 'bzrlib/tree.py'
--- a/bzrlib/tree.py	2007-07-28 22:45:28 +0000
+++ b/bzrlib/tree.py	2007-08-20 13:47:09 +0000
@@ -225,6 +225,32 @@
     def get_file_by_path(self, path):
         return self.get_file(self._inventory.path2id(path))
 
+    def iter_files_bytes(self, desired_files):
+        """Iterate through file contents.
+
+        Files will not necessarily be returned in the order they occur in
+        desired_files.  No specific order is guaranteed.
+
+        Yields pairs of identifier, bytes_iterator.  identifier is an opaque
+        value supplied by the caller as part of desired_files.  It should
+        uniquely identify the file version in the caller's context.  (Examples:
+        an index number or a TreeTransform trans_id.)
+
+        bytes_iterator is an iterable of bytestrings for the file.  The
+        kind of iterable and length of the bytestrings are unspecified, but for
+        this implementation, it is a tuple containing a single bytestring with
+        the complete text of the file.
+
+        :param desired_files: a list of (file_id, identifier) pairs
+        """
+        for file_id, identifier in desired_files:
+            # We wrap the string in a tuple so that we can return an iterable
+            # of bytestrings.  (Technically, a bytestring is also an iterable
+            # of bytestrings, but iterating through each character is not
+            # performant.)
+            cur_file = (self.get_file_text(file_id),)
+            yield identifier, cur_file
+
     def get_symlink_target(self, file_id):
         """Get the target for a given file_id.
 

=== modified file 'bzrlib/workingtree_4.py'
--- a/bzrlib/workingtree_4.py	2007-08-15 00:27:34 +0000
+++ b/bzrlib/workingtree_4.py	2007-08-21 01:40:39 +0000
@@ -1521,6 +1521,20 @@
     def get_reference_revision(self, file_id, path=None):
         return self.inventory[file_id].reference_revision
 
+    def iter_files_bytes(self, desired_files):
+        """See Tree.iter_files_bytes.
+
+        This version is implemented on top of Repository.iter_files_bytes"""
+        parent_index = self._get_parent_index()
+        repo_desired_files = []
+        for file_id, identifier in desired_files:
+            entry = self._get_entry(file_id)
+            if entry == (None, None):
+                raise errors.NoSuchId(self, file_id)
+            repo_desired_files.append((file_id, entry[1][parent_index][4],
+                                       identifier))
+        return self._repository.iter_files_bytes(repo_desired_files)
+
     def get_symlink_target(self, file_id):
         entry = self._get_entry(file_id=file_id)
         parent_index = self._get_parent_index()




More information about the bazaar-commits mailing list