[MERGE] filtered-deltas - basis for log DIR
Aaron Bentley
aaron at aaronbentley.com
Fri Mar 13 05:53:24 GMT 2009
Ian Clatworthy wrote:
> I put up a patch some weeks ago that make log DIR as fast
> as I could for pack repositories....This patch delivers the same
> functionality in a far less intrusive way.
> === modified file 'NEWS'
> --- NEWS 2009-03-12 14:02:53 +0000
> +++ NEWS 2009-03-13 01:59:44 +0000
> @@ -43,10 +43,21 @@
>
> API CHANGES:
>
> + * New API ``Inventory.filter()`` added that filters an inventory by
> + a set of file-ids so that only those fileids, their parents and
> + their children are included.
> + (Ian Clatworthy)
^^ looks like it would fit on the previous line.
> === modified file 'bzrlib/inventory.py'
> --- bzrlib/inventory.py 2009-03-12 10:04:53 +0000
> +++ bzrlib/inventory.py 2009-03-13 00:29:41 +0000
> @@ -1339,6 +1339,36 @@
> def is_root(self, file_id):
> return self.root is not None and file_id == self.root.file_id
>
> + def filter(self, specific_fileids):
> + """Copy an inventory filtering against a set of file-ids.
> +
> + Children of directories and parents are included.
> + """
> + interesting_parents = set()
> + for fileid in specific_fileids:
> + try:
> + interesting_parents.update(self.get_idpath(fileid))
> + except errors.NoSuchId:
> + # This fileid is not in the inventory - that's ok
> + pass
> + entries = self.iter_entries()
> + if self.root is None:
> + return Inventory(root_id=None)
> + other = Inventory(entries.next()[1].file_id)
This could be spelled as:
Inventory(root_id=self.root)
for path, entry in entries:
if entry.file_id == self.root:
continue
which I think is a bit cleaner. But either is fine.
if self.root is None:
> + return Inventory(root_id=None)
> + other = Inventory(entries.next()[1].file_id)
> + other.root.revision = self.root.revision
> + other.revision_id = self.revision_id
> + directories_to_expand = set()
> + for path, entry in entries:
> + file_id = entry.file_id
> + if (file_id in specific_fileids
> + or entry.parent_id in directories_to_expand):
> + if entry.kind == 'directory':
> + directories_to_expand.add(file_id)
I don't think this is correct. How are you ensuring that grandparents
and great-grandparents, etc. are being retained?
bb:comment until this issue is resolved.
> + elif file_id not in interesting_parents:
> + continue
> + other.add(entry.copy())
It seems strange for an optimization to be doing a copy.
> === modified file 'bzrlib/repository.py'
> --- bzrlib/repository.py 2009-03-12 02:45:17 +0000
> +++ bzrlib/repository.py 2009-03-13 01:59:44 +0000
> @@ -1313,19 +1313,38 @@
> rev_tmp.seek(0)
> return rev_tmp.getvalue()
>
> - def get_deltas_for_revisions(self, revisions):
> + def get_deltas_for_revisions(self, revisions, specific_fileids=None):
> """Produce a generator of revision deltas.
>
> Note that the input is a sequence of REVISIONS, not revision_ids.
> Trees will be held in memory until the generator exits.
> Each delta is relative to the revision's lefthand predecessor.
> +
> + :param specific_fileids: if not None, the result is filtered
> + so that only those file-ids, their parents and their
> + children are included.
> """
> + # Get the revision-ids of interest
> required_trees = set()
> for revision in revisions:
> required_trees.add(revision.revision_id)
> required_trees.update(revision.parent_ids[:1])
> - trees = dict((t.get_revision_id(), t) for
> - t in self.revision_trees(required_trees))
> +
> + # Get the matching filtered trees. Note that it's more
> + # efficient to pass filtered trees to changes_from() rather
> + # than doing the filtering afterwards. changes_from() could
> + # arguably do the filtering itself but it's path-based, not
> + # file-id based, so filtering before or afterwards is
> + # currently easier.
Filtering in changes_from will be more efficient, and I think that's
ultimately the right approach. But a direct implementation of
get_deltas_for_revisions directly on CHK repositories would make a lot
of sense.
Aaron
More information about the bazaar
mailing list