Stream fetch fails "insert_from_broken_repo" test

John Arbash Meinel john at arbash-meinel.com
Thu Jun 18 18:53:41 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So in trying to land my PackStreamSource patch, I ran into an issue with
streaming fetch, in general. Namely the test:

 bzr selftest -s bt.test_repository test_insert_from_broken_repo

Now, #1 this is only a test of a single format, when it really seems
like it should be a per_repo or per_inter_repo test.

Changing from InterPackRepository to PackStreamSource causes the
exception to not be raised, but as near as I can tell, that is true for
*any* StreamSource (at the moment).

The part that the test is triggering is:

1) Insert a text into the repository (fid, rev1c)
2) Build a new revision rev3 which does *not* reference rev1c, but the
*text* (fid, rev3) claims (fid, rev1c) as a parent.
3) During the insert, we call _check_references, which calls
"_get_external_refs()". This ends up seeing (fid, rev1c) being
referenced, but not present in the repository.

With StreamSink.insert_stream() things change. There are 2 differences:

Namely for missing entries it does:

for prefix, versioned_file in (
                ('texts', self.target_repo.texts),
...                ):
                if versioned_file is None:
                    continue
                missing_keys.update((prefix,) + key for key in
                  versioned_file.get_missing_compression_parent_keys())

It then does a single "fetch me whatever is missing in the first pass",
and then fails if there is still anything missing.

The issue breaks down in 2 ways:

1) It is somewhat random whether (fid, rev1c) is going to be a
compression parent. It certainly wouldn't for --2a repos, as it stands
it randomly is because the text sizes are 0, so you always get a delta
chain of length 1. (1 fulltext, 1 delta)
Because (fid, rev1c) ends up being a compression parent, it goes ahead
and fetches the text, and then nothing is missing. (sort of good, in
that it fills in what would otherwise be missing, though that missing
entry is a bad reference.)

2) So to make it fail, I tried increasing the length of the bogus
references, adding (fid, rev3) => (fid, rev1c) => (fid, rev1d)

However, the get_stream_for_missing_keys() will write a fulltext for
(fid, rev1c) because that is how we designed it.

Which means that again there are no missing *compression* parents. What
*is* missing is the actual parent text.


Now the check that Packer was doing, would be to actually check
"get_missing_parent_keys()". Though this would break with stacking (but
note that Packer doesn't get used for stacked repos...)

One option would be something that would:
 get missing_compression_parents
     and missing_parents_that_arent_in_fallback

Except with the way that stacking works with Smart requests, you don't
*have* fallbacks that you can query...

So at this point I feel like I'm out of options. I can't write a runtime
missing parent check that will work with Smart + Stacking, and the
'get_stream_for_missing_keys()' will fill in the compression parent.


I've decided to just disable that test for now, and submit this as a bug
report to Launchpad so we don't forget about it. But I would certainly
be interested in any feedback people have about possible solutions.

John
=:->


  affects bzr
  status triaged
  importance medium
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAko6fyUACgkQJdeBCYSNAAPOwgCgxNl3XyqnF8W5hkT4xrhwGiNT
8x8AniXPIjgi4/21cbsmxLpFXzJpMLWw
=voNX
-----END PGP SIGNATURE-----



More information about the bazaar mailing list