nested trees design-approach : composite trees vs iter_changes
Stephen J. Turnbull
stephen at xemacs.org
Tue May 5 07:42:09 BST 2009
Robert Collins writes:
> There is another concern, which is more weakly coupled to CT itself. I
> think that CT makes this harder to change in the future, which is why I
> mention it now:
>
> * The CT design enforces 'file ids are unique' across all the nested
> trees; this perhaps not meant to be a long term constraint, but it
> is one that we can't enforce - unlike our normal behaviour users
> will be able to make bzr break, and it won't be clear why, or how
> they should fix it (and arguably they won't have done anything
> wrong that would need fixing).
Ouch! How can you hope to get that from existing branches? Don't you
want to be able to use "legacy" branches as nested trees? It seems to
me that programs (or users) that are assigning fileids are quite
possibly doing that in some systematic way that may be unique *within*
a tree but "restarts" on the next tree. That is likely to result in
collisions.
The only way I can see to handle that is to "upgrade" the external
branch *including giving the files new ids in the CT world*. That
sounds fragile to me.
> Aaron has expressed concerns about supported nested trees inside our
> tree code. As most recently expressed to me:
>
> * it makes everyone pay the cost of the feature
I don't see what cost is being paid. A flag in the metadata that says
"this subdirectory is a nested project" and a check in traversal
operations to skip over those in non-recursive mode. Seems pretty
small, no?
> * it makes recursion mandatory
It had better not! Some use-cases will want recursion for a given
operation, others will not.
> * it hides what's really going on/it makes the code trickier to debug.
I don't understand Aaron's code, so take this with a grain of salt.
At the conceptual level, it seems to me that the biggest problem the
code faces is that the file systems we're dealing with don't know the
difference between an "ordinary" subdirectory, and a "nested root"
subdirectory. So in all directory traversal operations, including
looking for particular files to operate on, the code needs to check
whether a particular directory is part of the project you're "in" at
the moment (whatever that means, but the kind of thing I have in mind
is whether and how semantics should change if bzr is invoked in/on a
nested project subdirectory or in the parent directory).
Some operations probably should be discouraged, or cause warnings, or
something. Eg, it's not obvious what the semantics of a commit across
nesting boundaries should be. I can imagine projects where it's
important to *not* touch the code of the upstream, but to make
progress in developing, you want to experiment with changes in the
library subproject. So you want bzr to "freeze" commits on the
subproject---the experimental changes will remain in "pending" state
until you negotiate with upstream, and in general you try to find ways
to move them into the parent. In other cases, you may "own" both
projects, so such refactoring is admissible.
How about commits in subprojects? Should they propagate "up" to
parent projects, invoking commit on the parent too?
Should bzr allow/disallow/warn on moves across project boundaries?
Shouldn't that be a user option?
So it seems to me that "what's really going on" is a fundamental change
in the way bzr looks at the tree. Pulling that fundamental change out
into a separate class seems likely to cause code duplication and more,
not less, complex execution paths.
> * Its not hard to add 'recursive=True' default parameters to things
> like changes_from and iter_changes, which would make recursion
> optional.
Won't this be desirable/necessary anyway? Users who are using
"frozen" subprojects won't want to pay the cost of traversing them all
the time. If it's not useful in itself, though, "it's not hard to add
parameters" is not a good reason to do it.
> * We can address the all-or-nothing issue by making recursion
> controlling parameters default to False initially, with a plan to
> switch them to True as soon as that doesn't break the test suite.
If the "frozen subproject" use case is common, you may not want
recursion on by default in nested-tree projects. Maybe this should be
controlled by a configuration parameter in the subproject.
This kind of thing sort of works in favor of the CT approach, in that
in such a project it seems sensible for bzr to wake up, scan down the
nested trees to get their configs, and build that data structure.
More information about the bazaar
mailing list