[MERGE 0.92] move responsibility for storing entries to CommitBuilder
Ian Clatworthy
ian.clatworthy at internode.on.net
Thu Sep 6 12:46:09 BST 2007
Robert Collins wrote:
bb: resubmit
> This patch:
> - moves the responsibility for storing entries to CommitBuilder. So
> different repositories can store symlinks/files etc differently without
> double-indirection though the inventory entry.
Good.
> - deletes a bunch of stub methods that existed because of the
> double-indirection via inventory.
Which in turn has broken the public CommitBuilder API and probably stopped
the snapshot method on InventoryEntry from working. I really do think this
API break is necessary. But you need to document it in NEWS.
> Caveats:
> - I'm not sure if path_content_summary should exist, or if its entirely
> correct for windows at the moment, and it doesn't currently offer the
> hash-cache optimisation ever, so this is not suitable for inclusion in a
> release.
I've only reviewed the code using path_content_summary, not the
implementations for the various trees. From the profiling I've done
on commit over the months, I'm sure a single call into the tree is the
right idea though.
> That said, this is a reasonable chunk of work, and I'd like to keep the
> delta on the branch I'm working on down, so... I'm submitting it for a
> merge as soon as 0.92 opens with the plan of getting it reviewed and
> tweaking it as needed to be ready on tuesday.
I'd like to hear from Aaron and/or John on what they think of
path_content_summary. Pulling it's implementations and tests out into a
separate patch would be a good way to split the size of this change into
two. Like _iter_changes, we could make it _path_content_summary
initially if we're not 100% sure it's right.
As the versionedfile change included below is already merged into
bzr.dev now, a new bundle would be good regardless. :-)
More comments below.
> === modified file 'NEWS'
> --- NEWS 2007-09-05 08:18:57 +0000
> +++ NEWS 2007-09-05 23:08:07 +0000
> @@ -238,6 +238,10 @@
> incremental addition of data to a file without requiring that all the
> data be buffered in memory. (Robert Collins)
>
> + * New method on ``bzrlib.tree.Tree`` ``path_content_summary`` provides a
> + tuple containing the key information about a path for commit processing
> + to complete. (Robert Collins)
> +
More details needed on API breaks and performance impact of change.
> === modified file 'bzrlib/commit.py'
> self.builder = self.branch.get_commit_builder(self.parents,
> self.config, timestamp, timezone, committer, revprops, rev_id)
> + # tell the builder about the chosen recursive behaviour
> + self.builder.recursive = recursive
Why are you setting this via an attribute rather than adding a parameter
to the get_commit_builder() method?
> @@ -685,6 +686,7 @@
> work_inv = self.work_tree.inventory
> assert work_inv.root is not None
> entries = work_inv.iter_entries()
> + # XXX: Note that entries may have the wrong kind.
> if not self.builder.record_root_entry:
> entries.next()
> for path, existing_ie in entries:
This comment needs more explanation.
> === modified file 'bzrlib/inventory.py'
> --- bzrlib/inventory.py 2007-09-04 03:53:07 +0000
> +++ bzrlib/inventory.py 2007-09-05 05:51:34 +0000
> @@ -433,7 +433,7 @@
> self.revision))
>
> def snapshot(self, revision, path, previous_entries,
> - work_tree, commit_builder):
> + work_tree, commit_builder, store_if_unchanged):
You're added a parameter here and it's never used. Crud I assume?
> - def snapshot(self, revision, path, previous_entries,
> - work_tree, commit_builder):
> - def _snapshot_text(self, file_parents, work_tree, commit_builder):
> - """See InventoryEntry._snapshot_text."""
> - commit_builder.modified_link(
> - self.file_id, file_parents, self.symlink_target)
> - return True
> -
> - def _snapshot_text(self, file_parents, work_tree, commit_builder):
> - commit_builder.modified_reference(self.file_id, file_parents)
> - return True
> -
These changes are necessary but they mean ie.snapshot is broken. You've
removed the InventoryFile implementation of snapshot() but not the one
in InventoryEntry.
> + try:
> + ie.executable = content_summary[2]
> + lines = tree.get_file(ie.file_id, path).readlines()
> + ie.text_sha1, ie.text_size = self._add_text_to_weave(
> + ie.file_id, lines, heads, nostore_sha)
> + ie.revision = self._new_revision_id
It doesn't hurt but setting ie.revision here is redundant as it's done
after the if block anyhow.
> + elif kind == 'directory':
> + if not store:
> + # all data is meta here, so carry over:
> + ie.revision = parent_entry.revision
> + return
> + lines = []
> + self._add_text_to_weave(ie.file_id, lines, heads, None)
The code for directory and tree-reference are identical so I'd be
tempted to make that elif test an or. In terms of code speed, getting to
the symlink code after an extra check won't hurt. (Medium term, I'd
expect as many nested trees as symlinks in many repositories.)
> + elif kind == 'symlink':
> + current_link_target = content_summary[3]
> + if not store:
> + # symmlink target is not generic metadata, check if it has
Typo in commmment. :-)
> - def _add_text_to_weave(self, file_id, new_lines, parents):
> + raise NotImplementedError('unknown kind')
> + ie.revision = self._new_revision_id
> +
> + def _add_text_to_weave(self, file_id, new_lines, parents, nostore_sha):
> versionedfile = self.repository.weave_store.get_weave_or_empty(
> file_id, self.repository.get_transaction())
> - result = versionedfile.add_lines(
> - self._new_revision_id, parents, new_lines)[0:2]
> - versionedfile.clear_cache()
> - return result
> + try:
> + return versionedfile.add_lines(
> + self._new_revision_id, parents, new_lines,
> + nostore_sha=nostore_sha)[0:2]
> + finally:
> + versionedfile.clear_cache()
>
This is a good change. In fact, it needs to be pulled out and merged
into bzr.dev now doesn't it? (As add_lines can throw an exception since
the merge of the related change into bzr.dev earlier today, the call to
clearcache might be missed in the trunk. We should fix that without
waiting for this larger patch to land.)
> + def test_add_lines_nostoresha(self):
> + """When nostore_sha is supplied using old content raises."""
version + "2")
> +
> + def test_add_lines_nostoresha(self):
> + """When nostore_sha is supplied using old content raises."""
As mentioned on IRC, same method name repeated. Python sucks in not
detecting this IMO. More curiously though, merging this bundle into
bzr.dev gave me 4 copies of the routine. Is that a bug in our merge code?
In summary, I'm very pleased to see this. This restructuring between
Inventory and CommitBuilder is, IMO, a necessary thing and very much the
direction I wanted to take. API evolution notwithstanding, huge junks of
inventory.py can now be removed.
The resubmit is largely clean-up related, e.g. asserts in the code which
you've commented as being temporary. I'd like to see the patch split
into two as well with path_content_summary being one bit and the commit
refactoring leveraging it separated. As we discussed on the phone, there
are good reasons for moving _record_entry out of commit.py into
CommitBuilder as well, so the refactoring of commit sub-patch might well
take a few extra days to bed down if we add that bit in.
Ian C.
More information about the bazaar
mailing list