[MERGE] Reconcile can fix bad parent references
Aaron Bentley
aaron.bentley at utoronto.ca
Mon Aug 27 01:58:12 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> The NEWS entry could be a little more informative. Perhaps add in that
> these were caused back in bzr 0.8 timeframes and any branches that were
> created back then should run 'bzr check' asap.
OK
> This idiom:
> Won't work with packs, because there is no write group. If you insert
> repo.lock_write()
> repo.start_write_group()
> ...
> repo.commit_write_group()
> repo.unlock()
>
> around your manual data insertion it will work fine.
I can do that, but I thought we wouldn't need this for packs.
> A note here that the revisions are checked first so that XXXX would be
> good. I'm guessing that the reason is so that the data in the revision
> index is trustable.
It is so that the revision_versions dict is pre-populated.
>> def plan_revisions(self):
>> repository = self.repository
>> - self.planned_revisions = set(repository.all_revision_ids())
>> + self.planned_revisions = repository.all_revision_ids()
>> self.progress.clear()
>> inventoried = set(self.inventory_weave.versions())
>> - awol = self.planned_revisions - inventoried
>> + awol = set(self.planned_revisions) - inventoried
>> if len(awol) > 0:
>> raise BzrCheckError('Stored revisions missing from
>> inventory'
>> '{%s}' % ','.join([f for f in awol]))
>> - self.planned_revisions = list(self.planned_revisions)
>
> What do the changes to plan_revisions accomplish?
They reduce silly conversions. In the old code, we start with a list,
convert it to a set, and then convert it back to a list.
This just leaves it as a list.
> This uses the nodes-of-file graphs are nodes-of-revision graph property
> that John and I are debating whether we can use - because the edges are
> different.
> Until we decide that that will give identical results we
> should stay with the per file graph.
If I didn't know the knit data was untrustworthy, I could use that. But
I don't see how I can use corrupt parents data to repair itself. I
can't even know if I'm iterating in topological order...
>> === modified file 'bzrlib/repository.py'
>> --- bzrlib/repository.py 2007-08-22 05:28:32 +0000
>> +++ bzrlib/repository.py 2007-08-24 18:15:50 +0000
>> @@ -1112,6 +1112,66 @@
>> [parents_provider,
>> other_repository._make_parents_provider()])
>> return graph.Graph(parents_provider)
>>
>> + def _add_revision_text_version(self, tree, revision_versions):
>> + inv_revisions = {}
>> + revision_versions[tree.get_revision_id()] = inv_revisions
>> + for path, entry in tree.iter_entries_by_dir():
>> + inv_revisions[entry.file_id] = entry.revision
>> + return inv_revisions
>
> This doesn't access self at all: What made repository seem like the best
> place to put it ?
It was a helper for Repository.find_bad_ancestors.
> This is what, a cache of revisions -> text version
> mappings ? I'd be inclined to either have a standalone function
I never thought I'd see the day.
>, or an
> object to represent the cache if you have more than 1 function related
> to this data structure. Looking at find_bad_ancestors there the cache is
> used and added to as well, so I would really prefer to see a class for
> the cache, probably constructed with the repository as a data source:
>
> FooCache:
> init(repository)
> cache(tree)
> get_text_version(file_id, revision_id)
>
> This gives a home for the inner method in find_bad_ancestors
Sigh. Alright.
>> + def find_bad_ancestors(self, revision_ids, file_id,
>> versionedfile,
>> + revision_versions, parents_provider=None):
>> + """Search the versionedfile for ancestors that are not
>
> Just speculating here:
> likewise find_bad_ancestors doesn't seem to fit the general Repository
> API to me. Specifically, for packs we won't want to do it by versioned
> file anyway, rather by revision - check all the texts introduced by
> revision id X for their parent details.
find_bad_ancestors was designed to fit into Check._check_weaves. So it
needs to work on a single versionedfile at a time.
I thought for packs this would be an irrelevant issue, because we'd use
a fetch that won't tolerate this kind of brokenness.
> Is :
>
>> +class _RevisionParentsProvider(object):
>
> Really needed? If we check the revision is cached correctly at the start
> of check
We would also need to do the same in reconcile, and make sure that they
never get out of sync. I thought it was just simpler to use the data
that we actually trust, instead of a proxy.
> I think its better to check for an attribute like I did in my recent
> reconcile patch - _reconcile_does_inventory_gc is what I used. This is
> better than a subclass check because it allows foreign formats to be
> tested without requiring them to spuriously subclass Knits.
As we discussed it, only knits would support this, so I check for knits.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG0iGk0F+nu1YWqI0RAsnDAJ9Si2s97lwgCjS3+Nc9j6UHW9+TLACfYF5r
y82p5zC7jPI+sqOc7l7UISI=
=6EAp
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list