Article: An Idea for a Revision Control System

Tue Mar 14 09:11:52 GMT 2006

I came accorss an interesting article. Perhaps the developers will find
some ideas that could be implementable in bzr?

Jari

    Zed's blog: An Idea for a Revision Control System
    http://www.zedshaw.com/blog/programming/an_rcs_idea.html

    [...]

    I reviewed several others, but I was never really satisfied with
    how they work. I don't have a list of specific problems just yet,
    but I do have an idea of how I would like to do my SCM work.
    Basically, I would like to have a distributed kind of agent based
    system that lets everyone exchange revisions freely. The revisions
    would be in discrete chunks and published or sent to participants.
    A published revision would be "ready for others". Sending a
    revision to someone (P2P style) would be for short collaborations
    between people before publishing. As people work, exchanging
    revisions and publishing them, the SCM (or others) would grab the
    changes and create the releases from them, potentially updating
    the main branch.

    The distributed agent based exchange part would be implemented by
    a publish/subscribe style P2P network. There would be two general
    types of usage: people sending revisions between each other and
    people posting revisions to a central mediator service for others
    to pick-up later. In the first instance, people would need to be
    on-line at the same time and coordinate sending/receiving the
    revisions. In the second instance, people would want to publish
    the revision for others to pick-up when they're offline (or, as a
    more permanent publishing). This allows for people to collaborate
    in the way that is most natural to create changes to the source
    (through P2P), and then use the publish mechanism to get "ready
    for others" revisions to others (through publishing to the
    mediator).

    The SCM would then use these two mechanisms to grab revisions from
    people and apply them to the target deployment. An SCM would
    subscribe to the revisions (based on a naming system) they are
    interested in from the mediator. When revisions are published by
    others, the system notifies the SCM that they are available. The
    SCM is then able to grab them and apply them to the source in a
    controlled manner. Other members can also subscribe to changes
    they are interested in so they can coordinate their work. The SCM
    could also use the P2P transport system for verifying revisions
    quickly before the person who made them publishes them for others.

    I think that, using this publish/subscribe mechanism allows for
    better coordination between all participants because it gives them
    the following three control mechanisms for revisions:

        * Filtering through subscriptions. You just don't subscribe to
          the stuff you don't care about. You don't need to take down
          anything you don't want.

        * Ordering revision applications to the source. In CVS, you
          just have to take everything all at once or go through great
          pains to break out the parts you need. With this system, you
          would select the revisions you want to apply, and can
          specify that one is more important than another. There would
          need to be an "auto-pilot" mode for the times when you
          don't care though.

        * Flexible sources of revisions. You can either get revisions
          from various subscription points, or you can just get them
          directly from another member. Because of this, it may be
          possible to have entirely anonymous distribution networks
          (although, I'd think that is retarded).

    In order for this to work, the revision packaging system has to be
    capable of handling changes to any type of file, changes to
    directory structure, and logs of activities. This last part is
    something I thought would be nice: Separate the logging of
    developer activity from the revision publishing system. My idea
    here is that nobody really uses commit logs like they were
    intended in CVS, and really what you want is meta-data on the
    particular work completed with the given revision. It would be
    better to have a developer do their work, and using a small tool
    to record their work as they go in a separate "development log
    file". When the revision is created, this development log file
    is packaged with the revision. This allows the source tree to
    remain free of any unnecessary files, and lets the developer edit
    the log to make it sane before publishing or sending.

    I have a small python program that creates very primitive
    revisions right now. I got it to work, but it's totally not
    optimal. I may try it out in several scenarios doing revisions by
    hand to see how the work-flow operates. I may also try to get the
    revisions to work over e-mail with some other people to see if the
    publish/subscribe stuff works. I have found that, with this method
    of doing revisions, creating branches is really easy involving
    nothing more than a directory copy. It doesn't handle conflicts,
    but that might not be such a bad thing since I haven't found a
    tool yet that handles conflicts appropriately. I think it might be
    better for an SCM to sort out who's revisions are better based on
    the merits of each revision and the logs, possibly telling the
    publisher to change them as needed.

    Right now the ideas are flowing and I'm having fun, but we'll see
    if I don't run into the same problems that everyone else does.
    There are several issues which I haven't confronted yet, but I
    hope to re-use as much as possible and keep the system as simple
    as possible. One major goal would be to keep the system language
    and platform neutral with very minimal requirements for installing
    a client or mediator (or, even allow operating without a
    mediator).