[BUG] (trivial) duplicate text in "bzr pull --help"

Wed Jun 28 20:45:27 BST 2006

On Wed, 2006-06-28 at 14:18 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > The only knits for which caching is relevant during fetch
> > is the inventory knit - the revision knit does not have anything other
> > than the index retrieved until the revisions are fetched.
> 
> According to my log, inventory and revision knits both use the API
> inefficiently:
...

from fetch.py, you should be seeing:
    # we fetch only the referenced inventories because we do not
    # know for unselected inventories whether all their required
    # texts are present in the other repository - it could be
    # corrupt.
    to_weave.join(from_weave, pb=child_pb, msg='merge inventory',
                  version_ids=revs)

for revisions, KnitRepoFetcher does:
def _fetch_revision_texts(self, revs):
    # may need to be a InterRevisionStore call here.
    from_transaction = self.from_repository.get_transaction()
    to_transaction = self.to_repository.get_transaction()
    to_sf = self.to_repository._revision_store.get_signature_file(
        to_transaction)
    from_sf = self.from_repository._revision_store.get_signature_file(
        from_transaction)
    to_sf.join(from_sf, version_ids=revs, ignore_missing=True)
    to_rf = self.to_repository._revision_store.get_revision_file(
        to_transaction)
    from_rf = self.from_repository._revision_store.get_revision_file(
        from_transaction)
    to_rf.join(from_rf, version_ids=revs)

that is, it does two join() calls.

For each of these three joins, the InterKnit code path should kick
in(knit.py registers this at import).

The InterKnit code path does:
    for (version_id, raw_data), \
        (version_id2, options, parents) in \
       izip(self.source._data.read_records_iter_raw(copy_queue_records),
            copy_queue): 
       assert version_id == version_id2, 'logic error, inconsistent
results'
       count = count + 1
       pb.update("Joining knit", count, total)
       raw_records.append((version_id, options, parents, len(raw_data)))
       raw_datum.append(raw_data)
       self.target._add_raw_records(raw_records, ''.join(raw_datum))

While is designed to allow a single readv and writev.

I am guessing that your earlier optimisation work has pessimised the
readv queries that are generated, leading to you seeing what you are
seeing.

> >>But I don't think we have a cheap way of finding out
> >>whether the shortcut of copying knit records will create different
> >>annotations from those that would be produced by rediffing.
> > 
> > 
> > I think annotations should be considered a cache, and possibly invalid -
> > simply because different diff() routines will generate different
> > annotations, so if a specific annotation style is needed one should
> > reannotate.
> 
> I think that would have some nasty side effects.  We would always have
> to reannotate before performing a knit merge, because we can't expect
> good results if the annotations are invalid.

By invalid I dont mean 'wrong', I mean 'not as good as it could be'.
Consider for instance the change in diff algorithm if/when we switch the
knit sequence matching over to patience diff?

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060629/e76d0919/attachment.pgp