[MERGE] Stacked knits

Ian Clatworthy ian.clatworthy at internode.on.net
Fri Jun 27 11:51:00 BST 2008


Robert Collins wrote:
> This patch teaches KnitVersionedFiles how to stack on any other
> VersionedFiles object(s). It doesn't [yet] support annotations across
> such boundaries, but as it should be sufficient to get stacked branches
> up and running I thought I would send it in for review.

This is pretty core stuff so I'm not yet ready to approve this.
Only tweaks found so far. My main concern is not the code you've
put up for review - it's whether you've missed anything. I need to
think about that a little more. Otherwise, well done on this so far.

bb:comment

> +        missing_keys = set(nonlocal_keys)
> +        for source in self._fallback_vfs:
> +            if not missing_keys:
> +                break
> +            for record in source.get_record_stream(missing_keys,
> +                'unordered', True):
> +                if record.storage_kind == 'absent':
> +                    continue
> +                missing_keys.remove(record.key)
> +                bytes = record.get_bytes_as('fulltext')
> +                lines = split_lines(record.get_bytes_as('fulltext'))
> +                text_map[record.key] = lines
> +                content_map[record.key] = PlainKnitContent(lines, record.key)
> +                if record.key in keys:
> +                    final_content[record.key] = content_map[record.key]

bytes isn't used. Either delete that line or use it in calculating lines.

>          for key in keys:
> +            if key in nonlocal_keys:
> +                # already handled
> +                continue

I'm not convinced yet that this if test is the best one. Isn't it possible
(though unlikely) that an entry in nonlocal_keys won't be found in the
fallbacks? It would be safer to do 'if key in final_content'. Or I guess
you could check that missing_keys is indeed empty on exiting the for loop.

> +                    # no need to plan further back
> +                    components.append((cursor, None, None, None))

I'm yet to understand why these 2 lines have been added. Can you explain
it please?

> +    def _get_parent_map(self, keys):
> +        """Get a map of the parents of keys.
> +
> +        :param keys: The keys to look up parents for.
> +        :return: A tuple. The first element is a mapping from keys to parents.
> +            Absent keys are absent from the mapping. The second element is a
> +            list with the locations each key was found in. The first element
> +            is the in-this-knit parents, the second the first fallback source,
> +            and so on.
> +        """

I'd prefer this method to be called _get_parent_map_with_sources, say.

>                              # missing basis component
> -                            result = False
> +                            needed_from_fallback.add(chain[-1])
> +                            result = True
>                              break
>                  for chain_key in chain[:-1]:
>                      checked_keys[chain_key] = result
>                  if not result:
> -                    absent_keys.add(key)
> +                    needed_from_fallback.add(key)

The changing from 'result = False' to 'result = True' looks like a logic change
separate to the handling of fallbacks? In particular, checked_keys will contain
True for some items that previously had False. I need to spend some more time
understanding this block of code - I'm honestly not sure whether your change
here is right or wrong. Any hints? :-)

> +            for key in present_keys:
> +                for parent_map in parent_maps:
> +                    if key in parent_map:
> +                        key_source = parent_map
> +                        break
> +                if current_source is not key_source:
> +                    source_keys.append((key_source, []))
> +                    current_source = key_source
> +                source_keys[-1][1].append(key)

This looks right. The nested for loops scare me wrt performance though I can't
see how else to write that code.

> +        our_keys = parent_maps[0]
> +        # keys - needed_from_fallback
> +        # keys = keys - absent_keys

I don't think our_keys is used. Those two comment lines can go as well.

> +        total = len(keys)
>          # we don't care about inclusions, the caller cares.
>          # but we need to setup a list of records to visit.
>          # we need key, position, length
>          key_records = []
>          build_details = self._index.get_build_details(keys)
> -        for key in keys:
> -            key_records.append((key, build_details[key][0]))
> -        total = len(key_records)
> +        for key, details in build_details.iteritems():
> +            if key in keys:
> +                key_records.append((key, details[0]))
> +                keys.remove(key)

I'm ok with changing how the total is calculated as you've done
but it highlights the fact that progress-bar handling in this
routine is now only half done. You might want to tweak it a bit, e.g.
pass it down into the nested call below ...

> +        for source in self._fallback_vfs:
> +            if not keys:
> +                break
> +            source_keys = set()
> +            for line, key in source.iter_lines_added_or_present_in_keys(keys):

test_knit.py feedback now ...

> +        # lines added to the test that reference across the stack do a
> +        # fulltext.
> +        basis.add_lines(key_basis, (), ['foo\n'])
> +        basis.calls = []
> +        test.add_lines(key_cross_border, (key_basis,), ['foo\n'])
> +        self.assertEqual('fulltext', test._index.get_method(key_cross_border))
> +        self.assertEqual([("get_parent_map", set([key_basis]))], basis.calls)
> +        basis.calls = []
> +        # Subsequent adds do delta.

I think those last two lines should be swapped order wise.

> +        raise KnownFailure("Annotation on stacked knits currently fails.")
> +        details = test.annotate(key_basis)
> +        self.assertEqual([(key_basis, 'foo\n'), (key_basis, 'bar\n')], details)
> +        self.assertEqual([("annotate", key_basis)], basis.calls)

I'd like to see those last 3 lines (after the raise) commented out.

> +        # check() must not check the fallback files, its none of its business.

s/its none/it's none/

> +        basis.check = None
> +        test.check()

It feels pretty crude to ensure basis.check isn't called by doing it this way.
I was expecting to see the list of basis.calls checked instead. No big deal.

> +        # Its not strictly minimal, but it seems reasonable for now for it to

s/Its/It's/
That same comment is repeated several times is TestStacking so please fix it
multiple times.

> +    def test_insert_record_stream(self):
> +        # records are inserted as normal; insert_record_stream builds on
> +        # add_lines, so a smoke test should be all thats needed:

s/thats/that's/
Same tweak needed for the test_add_mpdiffs docstring.

> +        self.assertEqual(set([key_left, key_right]), set(basis.calls[3][1]))
> +        self.assertEqual('get_record_stream', basis.calls[3][0])
> +        self.assertEqual('unordered', basis.calls[3][2])
> +        self.assertEqual(True, basis.calls[3][3])

I'd prefer to see this refactored so that (say)

  last_call = basis.calls[3]

and that the fields were checked in 1st to last order, i.e. switch the
top 2 lines.

> +    def iter_lines_added_or_present_in_keys(self, keys, pb=None):
> +        self.calls.append(("iter_lines_added_or_present_in_keys", copy(keys)))
> +        return self._backing_vf.iter_lines_added_or_present_in_keys(keys)

You should pass the pb through here.

> -        :return: a list of sha1s matching keys.
> +        :return: a dict from key to sha1 digest. Keys of texts which are not
> +            present in the store are not not present in the returned

s/not not/not/

The change to the API for get_sha1s() ought to be mentioned in NEWS as well.

Ian C.



More information about the bazaar mailing list