Incorrect tracking of missing parents?
John Arbash Meinel
john at arbash-meinel.com
Fri May 8 17:26:20 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I'm looking at implementing stacking for -dev6 repositories, and I found
out that I need to implement _GCIndex.get_missing_parents, which is
implemented in Knits by setting a 'track parents' flag.
When I looked at it, I saw that it does this:
if parent_refs is not None:
parent_refs.update(parents)
parent_refs.discard(key)
However, this is fairly likely to generate a lot of false positives.
Consider the case of:
insert A => []
insert B => [A]
insert C => [B]
insert D => [C]
In that case, you will discard A, but then add it back as part of adding
B. B is discarded right away (no-op), but then added when adding C.
Now, I know that the final result then does difference update:
def get_missing_parents(self):
"""Return the keys of missing parents."""
# We may have false positives, so filter those out.
self._external_parent_refs.difference_update(
self.get_parent_map(self._external_parent_refs))
return frozenset(self._external_parent_refs)
Now, we have to do the final set difference, because we aren't checking
un-accessed indexes yet. (The stacked location could already have those
parents from a previous insert.)
I just get the feeling that the parent_refs.discard() from above isn't
really helping, and actually keeping track via 2 sets might be a lot
better. This is also true for
def scan_unvalidated_index(self, graph_index):
...
if self._external_parent_refs is not None:
# Add parent refs from graph_index (and discard parent refs that
# the graph_index has).
for node in graph_index.iter_all_entries():
self._external_parent_refs.update(node[3][0])
self._external_parent_refs.discard(node[1])
^- Again, depending on the ordering here, the 'discard()' will be
discarding everything *before* it gets added. I think it would be better
to do:
if self._external_parent_refs is not None:
new_refs = set()
existing_refs = set()
for node in graph_index.iter_all_entries():
new_refs.update(node[3][0])
existing_refs.add(node[1])
self._external_parent_refs.update(new_refs)
self._external_parent_refs = \
self._external_parent_refs.difference(existing_refs)
# diff_update is probably fine, since we know new_refs is generally >
# than existing_refs.
To 'get it done', I'm starting with the bad form of just walking
'iter_all_entries()' over everything, but I figure I should make sure I
understand the existing algorithm, and how to bring it across.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkoEXSsACgkQJdeBCYSNAAPjEwCfSSaE4AvLrtLFkeCV0/7begyi
jjUAoIUBZ48vpfVXEjX2otBWOa5n2kSe
=4MIJ
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list