[MERGE 0.92] move responsibility for storing entries to CommitBuilder
Robert Collins
robertc at robertcollins.net
Sun Sep 9 21:10:13 BST 2007
On Thu, 2007-09-06 at 21:46 +1000, Ian Clatworthy wrote:
> Robert Collins wrote:
>
> bb: resubmit
>
> > This patch:
> > - moves the responsibility for storing entries to CommitBuilder. So
> > different repositories can store symlinks/files etc differently without
> > double-indirection though the inventory entry.
>
> Good.
>
> > - deletes a bunch of stub methods that existed because of the
> > double-indirection via inventory.
>
> Which in turn has broken the public CommitBuilder API and probably stopped
> the snapshot method on InventoryEntry from working. I really do think this
> API break is necessary. But you need to document it in NEWS.
Yah, I meant to delete the snapshot method entirely; thanks for this
review, I'm looking back at it and it was clearly a bit rushed.
> > Caveats:
> > - I'm not sure if path_content_summary should exist, or if its
> entirely
> > correct for windows at the moment, and it doesn't currently offer
> the
> > hash-cache optimisation ever, so this is not suitable for inclusion
> in a
> > release.
>
> I've only reviewed the code using path_content_summary, not the
> implementations for the various trees. From the profiling I've done
> on commit over the months, I'm sure a single call into the tree is the
> right idea though.
I'd like to avoid the function altogether though; I think if I
restructure so its passed in then you can supply the tuple yourself, or
have the iterator you use to decide what is changed give it to you. For
now it seems useful though.
> > That said, this is a reasonable chunk of work, and I'd like to keep the
> > delta on the branch I'm working on down, so... I'm submitting it for a
> > merge as soon as 0.92 opens with the plan of getting it reviewed and
> > tweaking it as needed to be ready on tuesday.
>
> I'd like to hear from Aaron and/or John on what they think of
> path_content_summary. Pulling it's implementations and tests out into a
> separate patch would be a good way to split the size of this change into
> two. Like _iter_changes, we could make it _path_content_summary
> initially if we're not 100% sure it's right.
There is a separate branch for it if it needs alteration.
> As the versionedfile change included below is already merged into
> bzr.dev now, a new bundle would be good regardless. :-)
Not sure what you mean here, or why a new bundle would be useful.
> More comments below.
...
> > === modified file 'bzrlib/commit.py'
> > self.builder = self.branch.get_commit_builder(self.parents,
> > self.config, timestamp, timezone, committer, revprops, rev_id)
> > + # tell the builder about the chosen recursive behaviour
> > + self.builder.recursive = recursive
>
> Why are you setting this via an attribute rather than adding a parameter
> to the get_commit_builder() method?
it didn't seem like something we'd choose to use to change the builder
that was created at the time. Also I think I'll revert this change until
I'm actually ready to move subtree-commits into the builder - it may be
a YAGNI.
> > @@ -685,6 +686,7 @@
> > work_inv = self.work_tree.inventory
> > assert work_inv.root is not None
> > entries = work_inv.iter_entries()
> > + # XXX: Note that entries may have the wrong kind.
> > if not self.builder.record_root_entry:
> > entries.next()
> > for path, existing_ie in entries:
>
> This comment needs more explanation.
I'll remove it; the newer code we chatted about will make this more
clear.
> > === modified file 'bzrlib/inventory.py'
> > --- bzrlib/inventory.py 2007-09-04 03:53:07 +0000
> > +++ bzrlib/inventory.py 2007-09-05 05:51:34 +0000
> > @@ -433,7 +433,7 @@
> > self.revision))
> >
> > def snapshot(self, revision, path, previous_entries,
> > - work_tree, commit_builder):
> > + work_tree, commit_builder, store_if_unchanged):
>
> You're added a parameter here and it's never used. Crud I assume?
I deleted the use of snapshots entirely.
> These changes are necessary but they mean ie.snapshot is broken. You've
> removed the InventoryFile implementation of snapshot() but not the one
> in InventoryEntry.
Yes, oops.
> > + try:
> > + ie.executable = content_summary[2]
> > + lines = tree.get_file(ie.file_id, path).readlines()
> > + ie.text_sha1, ie.text_size = self._add_text_to_weave(
> > + ie.file_id, lines, heads, nostore_sha)
> > + ie.revision = self._new_revision_id
>
> It doesn't hurt but setting ie.revision here is redundant as it's done
> after the if block anyhow.
Ah missed it, I had been factoring that out.
> > + elif kind == 'directory':
> > + if not store:
> > + # all data is meta here, so carry over:
> > + ie.revision = parent_entry.revision
> > + return
> > + lines = []
> > + self._add_text_to_weave(ie.file_id, lines, heads, None)
>
> The code for directory and tree-reference are identical so I'd be
> tempted to make that elif test an or. In terms of code speed, getting to
> the symlink code after an extra check won't hurt. (Medium term, I'd
> expect as many nested trees as symlinks in many repositories.)
Hmm, I think its clearer to show that we encode the data conceptually
separately, for all that the encoding logic is identical today.
> > - def _add_text_to_weave(self, file_id, new_lines, parents):
> > + raise NotImplementedError('unknown kind')
> > + ie.revision = self._new_revision_id
> > +
> > + def _add_text_to_weave(self, file_id, new_lines, parents, nostore_sha):
> > versionedfile = self.repository.weave_store.get_weave_or_empty(
> > file_id, self.repository.get_transaction())
> > - result = versionedfile.add_lines(
> > - self._new_revision_id, parents, new_lines)[0:2]
> > - versionedfile.clear_cache()
> > - return result
> > + try:
> > + return versionedfile.add_lines(
> > + self._new_revision_id, parents, new_lines,
> > + nostore_sha=nostore_sha)[0:2]
> > + finally:
> > + versionedfile.clear_cache()
> >
>
> This is a good change. In fact, it needs to be pulled out and merged
> into bzr.dev now doesn't it? (As add_lines can throw an exception since
> the merge of the related change into bzr.dev earlier today, the call to
> clearcache might be missed in the trunk. We should fix that without
> waiting for this larger patch to land.)
No, bzr.dev does not use the optional API that can cause that exception.
> > + def test_add_lines_nostoresha(self):
> > + """When nostore_sha is supplied using old content raises."""
> version + "2")
>
> > +
> > + def test_add_lines_nostoresha(self):
> > + """When nostore_sha is supplied using old content raises."""
>
> As mentioned on IRC, same method name repeated. Python sucks in not
> detecting this IMO. More curiously though, merging this bundle into
> bzr.dev gave me 4 copies of the routine. Is that a bug in our merge code?
I fixed this in bzr.dev.
> In summary, I'm very pleased to see this. This restructuring between
> Inventory and CommitBuilder is, IMO, a necessary thing and very much the
> direction I wanted to take. API evolution notwithstanding, huge junks of
> inventory.py can now be removed.
I'll update this with the fixes. Re path_content_summary I don't see a
compelling reason to merge or discuss it separately, its not a large
branch, and the presence of path_content_summary in the same patch gives
reviewers context.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070910/98960269/attachment.pgp
More information about the bazaar
mailing list