[MERGE] Stacked knits

Tue Jul 1 01:48:06 BST 2008

On Fri, 2008-06-27 at 20:51 +1000, Ian Clatworthy wrote:

Feedback on the queries - I have elided simple to-take actions.

> >          for key in keys:
> > +            if key in nonlocal_keys:
> > +                # already handled
> > +                continue
> 
> I'm not convinced yet that this if test is the best one. Isn't it possible
> (though unlikely) that an entry in nonlocal_keys won't be found in the
> fallbacks? It would be safer to do 'if key in final_content'. Or I guess
> you could check that missing_keys is indeed empty on exiting the for loop.

if its nonlocal, and not in the fallbacks, its absent altogether and
caught prior to this.

> > +                    # no need to plan further back
> > +                    components.append((cursor, None, None, None))
> 
> I'm yet to understand why these 2 lines have been added. Can you explain
> it please?

The logic in this loop builds a stack (called components) of things to
delta against to build an output.
Intermediate outputs are cached in a dict.
These two lines allow an externally sourced component to truncate the
build path at a cached component rather than going further.

> > +    def _get_parent_map(self, keys):
> > +        """Get a map of the parents of keys.
> > +
> > +        :param keys: The keys to look up parents for.
> > +        :return: A tuple. The first element is a mapping from keys to parents.
> > +            Absent keys are absent from the mapping. The second element is a
> > +            list with the locations each key was found in. The first element
> > +            is the in-this-knit parents, the second the first fallback source,
> > +            and so on.
> > +        """
> 
> I'd prefer this method to be called _get_parent_map_with_sources, say.

Uhm, sure.

> >                              # missing basis component
> > -                            result = False
> > +                            needed_from_fallback.add(chain[-1])
> > +                            result = True
> >                              break
> >                  for chain_key in chain[:-1]:
> >                      checked_keys[chain_key] = result
> >                  if not result:
> > -                    absent_keys.add(key)
> > +                    needed_from_fallback.add(key)
> 
> The changing from 'result = False' to 'result = True' looks like a logic change
> separate to the handling of fallbacks? In particular, checked_keys will contain
> True for some items that previously had False. I need to spend some more time
> understanding this block of code - I'm honestly not sure whether your change
> here is right or wrong. Any hints? :-)

Its not as clearly expressed as perhaps we can make it. We're
essentially delegating the component lookup to the fallback vfs in this
case.

> > +            for key in present_keys:
> > +                for parent_map in parent_maps:
> > +                    if key in parent_map:
> > +                        key_source = parent_map
> > +                        break
> > +                if current_source is not key_source:
> > +                    source_keys.append((key_source, []))
> > +                    current_source = key_source
> > +                source_keys[-1][1].append(key)
> 
> This looks right. The nested for loops scare me wrt performance though I can't
> see how else to write that code.

For 10K keys with 1 fallback repository its:
 10K outerloops
  2 inner loops

or 20K total steps.

Shouldn't be an issue unless/until someone has hundreds of different
fallback repos - and then it simply reflects reality.

> > +        total = len(keys)
> >          # we don't care about inclusions, the caller cares.
> >          # but we need to setup a list of records to visit.
> >          # we need key, position, length
> >          key_records = []
> >          build_details = self._index.get_build_details(keys)
> > -        for key in keys:
> > -            key_records.append((key, build_details[key][0]))
> > -        total = len(key_records)
> > +        for key, details in build_details.iteritems():
> > +            if key in keys:
> > +                key_records.append((key, details[0]))
> > +                keys.remove(key)
> 
> I'm ok with changing how the total is calculated as you've done
> but it highlights the fact that progress-bar handling in this
> routine is now only half done. You might want to tweak it a bit, e.g.
> pass it down into the nested call below ...

Pasing it down into the child bars will result in confusing output.

> test_knit.py feedback now ...
> 
> > +        # lines added to the test that reference across the stack do a
> > +        # fulltext.
> > +        basis.add_lines(key_basis, (), ['foo\n'])
> > +        basis.calls = []
> > +        test.add_lines(key_cross_border, (key_basis,), ['foo\n'])
> > +        self.assertEqual('fulltext', test._index.get_method(key_cross_border))
> > +        self.assertEqual([("get_parent_map", set([key_basis]))], basis.calls)
> > +        basis.calls = []
> > +        # Subsequent adds do delta.
> 
> I think those last two lines should be swapped order wise.

?

> > +        raise KnownFailure("Annotation on stacked knits currently fails.")
> > +        details = test.annotate(key_basis)
> > +        self.assertEqual([(key_basis, 'foo\n'), (key_basis, 'bar\n')], details)
> > +        self.assertEqual([("annotate", key_basis)], basis.calls)
> 
> I'd like to see those last 3 lines (after the raise) commented out.

Why? This is what we document as 'The right way' to write a known
failure - do a test you want to pass, find where it fails, and raise a
KnownFailure at that point, leaving the rest of the test intact.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080701/2dc8b131/attachment.pgp