versioned file api audit

John Arbash Meinel john at arbash-meinel.com
Mon Mar 17 15:16:26 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> so I just audited the public interface of versioned file. Heres what I
> propose we keep/discuss/remove from it.
> 
> 
> vf apis to keep:
> check_not_reserved_id
> versions
> add_lines
>   (with ghosts version/perhaps flag)

I'm not sure if we did this yet, but 'add_lines' should also return the
sha1 sum if it had to compute it. (I think we just did it and said
'screw backwards compatibility', but I'm not positive.)

> add_mpdiff
>   (nb, add_lines adds one version, add_mpdiffs adds many, we should make
> these
>    consistent)
> check (but rename validate?)
> get_format_signature
> make_mpdiffs (but rename to get_something, or iter_mpdiffs) ?
> get_sha1s
> get_parent_dict (because this is what bzrlib.graph wants; iter_parents
> is
>     better on incremental work)

I think you mean 'get_parent_map'

I'm not sure what you mean by 'iter_parents' being better, but I'm
willing to discuss it.

IIRC, iter_parents was calling into get_parents() which has the problem
that it is actually doing "get_parents_no_ghosts()" which is a
*significant* performance penalty. At that level the pack index is doing
0 caching, so it is a bisect 2x.

The common way of going through history is to ask the index, come back
with the next parents, go back and ask the index again based on which
parents you want, etc.

It would be nice if we could stream in the nodes we want and stream out
the nodes to get. Maybe an "iter_ancestry()" style call with flags for
lefthand_only, include_ghosts, etc?

> annotate_iter (perhaps rename to iter_annotations)

For the moment, I really think it makes more sense to have a plain
"annotate()" which just returns the lines.

All implementations basically do this and then add "iter(lines)" to
return a real iterator.

Conceptually, I like the idea that we wouldn't have to load a full-text
into memory including annotations. Pragmatically, we are a *long* way
away from it.

> iter_lines_added_or_present_in_versions
> weave_merge
> 
> apis to discuss:
> plan_merge (perhaps keep this)
> 
> apis to remove:
> has_version

I think we do need a way to probe sometimes. I know it is potentially
expensive as a LBYL. I think the problems I ran into were when you have
APIs which take a list of revisions. So you may end up doing

get(revision_list)

And 1 of them in the middle fails because it isn't present. It is pretty
hard to recover cleanly from that situation. I would certainly like to
structure code (as much as possible) to avoid it, though.

> copy_to

gone

> access_mode in constructor
> finished

At one point I think we had an "open()" call on VF, which isn't really
used, so certainly it can be gone.

> iter_parents

I thought you wanted to keep this up above.

> join
> annotate
> get_sha1
> has_ghost
> add_lines_non_ghost
> enable_cache
> clear_cache
> clone_text
> create_empty
> get_suffixes
> get_text
> get_texts
> get_lines
> get_ancestry
> get_ancestry_with_ghosts
> get_graph
> get_graph_with_ghosts
> transaction_finished
> 

I think a lot of these can go as we switch to Repository.get_graph().
There are probably areas where you will pessimize Knit repos slightly,
but I don't think anything terribly much.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH3otKJdeBCYSNAAMRAsZwAJ42oCi2Mwx6itFKkoR6spK80j4PXwCfVD6H
FolfbAbGv+jaTx+lm4W215E=
=6k2Z
-----END PGP SIGNATURE-----



More information about the bazaar mailing list