non-recursive status of a directory?

raindog at macrohmasheen.com raindog at macrohmasheen.com
Sat Jun 7 04:43:19 BST 2008


Rob, you're missing the fact that in a directory of10k files, the user wants the status of those files, in a dir of 10k dirs each containing 10k files, the user only cares about the status of the dirs, not the status of the 10k files in the dirs because he cannot see those files.
Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: Robert Collins <robertc at robertcollins.net>

Date: Sat, 07 Jun 2008 12:55:12 
To:Mark Hammond <mhammond at skippinet.com.au>
Cc:bazaar at lists.canonical.com
Subject: RE: non-recursive status of a directory?


On Sat, 2008-06-07 at 10:30 +1000, Mark Hammond wrote:
> > Given what I'm hearing, I don't really percieve non-recursive as a
> need
> > for the tbzr code:
> 
> Sure - its not a *need* - but please take my word for it that tbzr's
> implementation would be "faster", in terms of user responsiveness,
> with it, and quite a bit simpler.  Similarly, the people running 'bzr
> status' also don't *need* non-recursive status, but the person who
> added the "todo" note, and the people who implemented svn thought it
> might be a helpful option to provide :)
>
> Note that svn allows recursive or non-recursive.  tsvn explicitly
> chooses the non-recursive option for good reason.  tsvn has lots of
> real-world based tuning tweaks, which is why I'm trying to follow
> their model as closely as possible.  I think the reality is that on
> Windows, tbzr will be compared performance wise against tsvn and
> people will draw conclusions about the performance of bzr versus svn
> from that.

I certainly don't want to impede performance. Note however that
references to svn for performance are - well problematic. Last I recall
checking our status leaves svn st for dead; svn *requires* non-recursive
mode because of a fundamentally problematic approach to representing
branches. Neither of these make the fact that svn has a non-recursive
facility a compelling reason for bzr (or tbzr) to have one.

The use case for 'I need an emblem for the contents of <dir>' is
certainly something to support. But that isn't the same as a
non-recursive iter_changes. Specifically you don't care about all the
changes in a non-visible directory, you only care *if* there are changes
anywhere down-tree from <path>. Its kindof like 'diff' vs 'cmp -s'; in
the former case you want the details, in the latter case you just want a
boolean.

> So no, I don't *need* it, but I believe I've excellent reasons for
> wanting it.
> 
> > You describe an iterative process whereby details on a directory
> > accumlate, starting with 'not modified' and ending up with 'a
> > reasonable UI flag'.
> 
> The thing is, in many cases, it is *not* necessary to recurse to the
> bottom of a tree to find the full status of a directory, so in some
> cases, the bottom children will *never* be looked at.  As soon as you
> find a modified child, at any depth, you could present the status of
> that directory.  Thus, asking bzr to recurse fully means far more
> operations than necessary would have occurred before the state can be
> shown to the user (or alternatively, more operations are wasted after
> the status is shown).  On a large tree, this could be a significant
> win.

If I have a single directory with 10K files in it, the same argument
applies to stopping at the first modified file (when reporting on the
directory above).

And in this case I think:

generator = tree.iter_changes(specific_files=['path_to_dir_to_scan'])
try:
    generator.next()
except StopIteration:
    modified = False
else:
    modified = True
del generator

will do the least possible work to answer the question. In a totally
unmodified tree it will stat everything. In a modified tree with a file
changed in the directory it will stop at the first file.

> Its not yet clear to me that iterating file by file will give me
> "missing" items etc, and I'd really be surprised if that didn't come
> at a significant performance cost - but regardless, I'm a little
> confused by your position.  Is it:
> 
> * tbzr doesn't need, or even *want* non-recursive status, and while
> you think it does you don't understand the problem.
>
> or
> 
> * bzr doesn't provide non-recursive status, and its such an obscure
> requirement it is unlikely to do so in the short term.  Please make
> alternative arrangements.
> 
> or something else?

Its: 'directories are a poor proxy for dividing work up within the
system'. 

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.



More information about the bazaar mailing list