[MERGE] knit._PackAccess

Wed Aug 8 03:25:28 BST 2007

On Fri, 2007-08-03 at 10:47 -0400, John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> I was looking through the code earlier, and I never saw a good 
> definition of
> what a 'node' contains. It seemed a little random. And took me a while 
> to track
> down what was actually being done.

ok. I'll read the code through once and see about tweaks to improve
this.

> Also, this should have been caught in an earlier review but in the test 
> suite
> you did:
> 
> class TestGraphIndexKnit(KnitTests):
>      """Tests for knits using a GraphIndex rather than a KnitIndex."""
> 
>      def make_g_index(self, name, ref_lists=0, nodes=[]):
>          builder = GraphIndexBuilder(ref_lists)
> 
> v- Here you do "(node, references, value)"
> 
>          for node, references, value in nodes:
>              builder.add_node(node, references, value)
>          stream = builder.finish()
>          trans = self.get_transport()
>          trans.put_file(name, stream)
>          return GraphIndex(trans, name)

> ^- But this looks a lot more like "node, value, references".
> Oh, and it might be clearer if you called it "node_id", since it isn't 
> the
> whole node.

key, value, references is the official api, I'll correct this.

> Oh, and in 'bzrlib.tests.test_knit" you do:
> from bzrlib.index import *
> 
> Which is pretty bad form for our code. Mostly because I saw: 
> GraphIndexBuilder
> and was trying to figure out where it came from.
> But I couldn't find it by searching through the code. And it took me 
> quite a
> while before I started to think about searching for "from X import *" 
> (and if
> there were more than 1, it would be a *real* mess)

Oh. I really like the import * functionality for getting a whole module
- and tests do this all the time. I can change this.

> Overall, I understand why you have 'node' being a tuple of tuples. But 
> it isn't
> a very pretty interface. It is pretty hard to understand "node[0]" 
> versus
> "node.node_id".

> I'm okay with a tuple interface, as long as it is encapsulated well (so 
> we
> don't have to do tuple gymnastics in the bulk of our code). I've already 
> stated
> that I think _iter_changes should really return objects, since by then 
> we've
> culled the bulk. If we know that we'll having page-based indicies, we 
> may also
> have already culled the bulk of the data.

Initial pull and push will always be accessing every single key; we're
dealing with very large key counts. I'd be loath to add an object
interface to this code. Indexing is so low level, and so performance
sensitive :(.

> Typo here:

fixed

> v- This line is >79 chars

fixed

> ^- You call the member 'prefix' but it is actually a full key with the 
> missing
> fields set to None, and then you set the actual prefix to 'prefix_key'.
> If anything a "prefix_key" sounds like a complete key, while "prefix" 
> sounds
> like just the prefix members.

I've juggled this around, prefix is now prefix, prefix_key is prefix +
(None,)*missing_key_length.

> Also, I would think that since these indicies are meant to be 
> write-once, you
> wouldn't want an adapter when adding nodes. Maybe I'm just missing
> what 
> you are
> trying to do.

Map all of the text knit indices to a single GraphIndex with keys of
(fileid, revisionid). During 'pull' knit.join() is called, and the
KnitGraphIndex needs to add nodes to the current open InMemoryGraphIndex
which is an index builder that can answer queries on the nodes that have
been added.

> +        try:
> +            for (key, value, node_refs) in nodes:
> +                adjusted_references = (
> +                    tuple(tuple(self.prefix_key + ref_node for
> ref_node 
> in ref_list)
> +                        for ref_list in node_refs))
> +                translated_nodes.append((self.prefix_key + key,
> value,
> +                    adjusted_references))
> 
> ^- This is pretty opaque as to what you are trying to do.

I've added your comment.

> Also at this point, you are using a try/except ValueError, I assume to 
> catch
> when you don't have actual references. Isn't it better to use:
> 
> if self.adapted._node_refs:

There is no public interface for determining whether there are
references or not. Its a TODO.

> _KnitAccess
...
> I don't believe that we have any live code that uses 'open_file', and I 
> would
> rather not use 'create'. I believe _KnitData._open_file() is just relic 
> code.

We do have live code that does that. Sorry :). Note that I've made the
methods optional and no-ops for the pack access method.

> I don't know if you have code that needs it, or if you were just 
> maintaining
> compatibility. I would rather avoid them if possible.

> +        :param writer: A tuple (pack.ContainerWriter, write_index) 
> which
> +            is contains the pack to write, and the index that reads 
> from
> +            it will be associated with.
> +        """
> 
> ^ "which is contains" ?
> I think you can just remove 'is'

typo, fixed.

> +        The data is spooled to the container writer in one bytes record 
> per
> +        raw data item.
> 
> "writer in one byte-record per raw data item"
> or bytes-record, just to make clear it isn't a 'one byte' record.

done

>       def add_record(self, version_id, digest, lines):
> -        """Write new text record to disk.  Returns the position in the
> -        file where it was written."""
> +        """Write new text record to disk.
> +
> +        Returns index data for retrieving it later, as per 
> add_raw_records.
> +        """
> ^- Eventually I'm guessing these apis will need to be updated to include 
> stuff
> like "parents" so that we can store that as part of the data, rather 
> than only
> having it in the index. But you don't need to change that yet.

Right, one thing at a time. Softly softly catchee rabbit.

> ^- these should have spaces, as they aren't parameters.
>              add_nodes_callback = result.add_nodes

Done.

Thanks, I'm sending it in now.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070808/c445e25e/attachment.pgp