[RFC] walkdirs skip api

John Arbash Meinel john at arbash-meinel.com
Wed Aug 16 03:41:58 BST 2006


Robert Collins wrote:
> Hi,
> 	I've figured out what I think is a reasonable API for skipping dirs
> with walkdirs.
> 
> Currently we do a del on the dirblock to remove things we dont want to
> iterate over. This is somewhat expensive, and very expensive when we try
> to stack walkdirs - as we do in WorkingTree.walkdirs (we stack walking a
> RevisionTree and the osutils.walkdirs together to show
> versioned/unversioned file state etc).
> 
> I propose the following:
> walkdirs() will return a Walker object which has two methods:
> walk()
> skip(basename)
> 
> walk() will return the current generator
> skip(basename) will schedule basename to be skipped when next() is
> invoked on the generator.
> 
> How does that sounds ?
> 
> -Rob

I'm a little confused about next() versus walk(), but I think you mean
it works like this:

walker = osutils.walkdirs()
for dirinfo, entries in walker.walk():
  for relpath, name, kind, st, ..., abspath in entries:
    if name == '.bzr':
      walker.skip('.bzr')
      continue


And then the use for WorkingTree.walk() would be more like:

walker = wt.walkdirs()
for dirinfo, entries in walker.walk():
  for relpath, name, version_type, ie in entries:
    if version_type in ('I', '?'):
      walker.skip(name)
    else:
      ... do whatever it is you do ...

I think that works quite well as an api. I feel like Tree.walkdirs()
needs to return different objects than a plain osutils.walkdirs(), so I
wonder if it shouldn't be a different function name.

I also wonder if we don't want a working-tree specific walk function,
that lets us iterate over the current tree and all parent trees
simultaneously. 'dirstate' is being designed around this, explicitly to
make commit fast. (Since it already has all parent info sorted out per
file).

It might be nice to have:

full_walker = tree.walk_with_parents(file_list)

for (dir_relpath, dir_abspath), entries in full_walker.walk():
  for (basename, ..., cur_ie, parent_entries) in entries:
    for (parent_path, parent_ie) in parent_entries:
	...

Anyway, I'm not sure how it should exactly work. But I think something
like that could perform well, take good advantage of dirstate, and
really do pretty good even with our current format.

Now that I've thought about it, I'm tempted to call it
WorkingTree.walk() but that is probably confusing in that you take the
returned object, and call walk() on it.
walktree() seems redundant, since you have a tree already.
walk_inventory() seems to expose the inventory when we may or may not
keep it.
get_walker() says what we are doing....
walk_entries() Might be a good name (akin to iter_entries(), only
obviously using the Walker paradigm)

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060815/1e6b3afa/attachment.pgp 


More information about the bazaar mailing list