for discussion None vs null: vs current:

Tue Jul 18 14:29:05 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> More grist for the HACKING file.
>
> What do we mean by null: vs None with revision ID's?

Both have been used to refer to the Null Revision.    We defined the
Null Revision as part of our original discussion at the baz code sprint
in London.  The Null Revision is the implicit ultimate ancestor of all
branches, and its tree shape is the Empty Tree.  Currently, it has no
revision XML defined, but there has been some discussion of changing that.

Originally, None was used for the revision ID of the Null Revision, but
this caused various problems, because this is an unconventional use of None.

In standard Python use, None means "no return value" (when returned from
a function/method), "no value supplied" (when supplied as an argument)
or "value unknown/uninitialized" in objects.

These meanings are useful and are supported by the language.  In
particular, it can be useful to specify "no revision id supplied" or "no
revision id found".  The Inventory object currently uses None for its
revision_id value, in keeping with standard Python usage.

There have been problems related to the use of None as the Null
Revision's ID.  Functions that fail to return, and thus produce None,
are interpreted to have returned a valid revision ID, and this is hardly
desirable behaviour.  We have also, in some places, had to twist our
code out of shape because we could not supply None as a default argument.

NULL_REVISION was introduced as a special value intended to supplant the
use of None as the revision id of the Null Revision.  When the
conversion is complete, it will be safe to use None with its
conventional meanings of "no argument supplied", "no value returned",
and "value unknown/uninitialized".  I consider this a big win.

The places where None is used are simply places that haven't been
converted yet.

> NULL in database terms means [roughly] unknown.
>
> That is NULL != NULL, and any operation with NULL, other than 'is NULL'
> results in NULL. Selecting records with any formula will not return
> records with NULL there - its not less than or equal or greater than any
> other value.

This sounds a bit like NaN.  Our NULL_REVISION is more akin to the NULL
pointer in C.

> We have in bzrlib EmptyTree, and repository.revision_tree() returns
> EmptyTree() for revision_id None, but not for revision_id NULL_REVISION.

What makes you say that?

    @needs_read_lock
    def revision_tree(self, revision_id):
        """Return Tree for a revision on this branch.

        `revision_id` may be None for the null revision, in which case
        an `EmptyTree` is returned."""
        # TODO: refactor this to use an existing revision object
        # so we don't need to read it in twice.
        if revision_id is None or revision_id == NULL_REVISION:
            return EmptyTree()
        else:

> So, I'd like to suggest that we 'start fresh' on this, and do an audit
> in the 0.9->1.0 timeframe to clean this up. It will make it easier to
> make it all consistent I think.

I don't think there's anything inconsistent here.  Just a failure to
update some of our code.

> I propose the following definitions:
> 'empty:' which will be equivalent to None in python, and refers to the
> EmptyTree.

I am fine with changing the value of NULL_REVISION, but I do not want to
make it equivalent to None.  That would cement a bad usage of None that
we have been working our way away from.

> 'null:' which will mean an unknown revision.

I do not think it is a good idea to reuse the string 'null:' here, at
least not until a decent interval has passed.  Also, Martin has proposed
that we might use a special character in special values like these, and
colon is not very special: we use it in convetional revision ids from
baz-import and other importers.

So, perhaps '/null/' or '?unknown?' or '\xa0unavailable\xa0' would be a
better choice.

> This is the placeholder
> revision put into get_ancestry and other apis to represent data we
> cannot access

This seems similar to the conventional usage of None.  If None is to be
equivalent to anything (and I'm not sure it should), it should be
equivalent to your NEWNULL.

> - i.e. due to ghosting. 'null:' is defined as never being
> equal to anything. So we'll have a magic object that looks like a string
> but overloads == to return False. We may want to tweak it further to
> make dict lookups fail for it... This will help generate the correct
> behaviour for graph traversals and the like. null: will be serialised as
> 'null:' if we have reason to show it - i.e. an external show_ancestry
> command. We will never store it in versioned files or bzr data files.

The rest of this seems fine, except that there is probably overlap
between NEWNULL and None.

> 'current:' will mean the revision_id of a workingtree. current:, like
> null: is a magic revision. For this revision, 'current:' != 'current:' -
> this is to prevent tree compare shortcircuits and the like, without
> needing special casing. We want a different value to null: to allow
> 'tree.revision_id is CURRENT_REVISION' style clauses to work.

This sounds good and useful to me, modulo discussion of special characters.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEvOIh0F+nu1YWqI0RAn+fAJ4iVAuX9IjNQ7gW9ZsfHtMZ93A9wQCfaAAh
rpyNhEpFQEZw3VDbfMQ/JOw=
=frqe
-----END PGP SIGNATURE-----