transactions - end of transaction actions

Sun Jan 22 11:10:01 GMT 2006

Right,
	I've figured out what my reservations where about actions at the end of
transactions. Its basically that we need a certain degree of finesse,
and I'm worried than in this case an incremental approach to
implementation will cause difficulties. Which is why I stopped at
passthrough transactions in my first cut, to not prejuidice whatever
came later. So, in order to stop blocking people from putting something
sane in, let me get the ball rolling on what that sane thing should do. 

First some background:

We have in principal three transaction(sessions/whatever we rename it
to) types:
ReadOnly(Caching)
ReadWrite(Caching)
ReadWrite(aka PassthroughTransaction)

Now, we have some features we wish to achieve:
 1 multiple read transactions can happen concurrently
 2 failed transactions at worst lead to a modicum of wasted space - no
repository corruption under any failure mode.
 3 a single readwrite transactions can happen concurrently with read
transactions

To achieve 1 is relatively easy - read transactions dont modify
anything.

Achieving 2 is also fairly easy - as long as we order our changes to the
disk representation of domain objects such that there are never dangling
pointers during a transaction, AND ensure that writes to any single disk
representation are done atomically, then any failure will at most
introduce leaked objects, which a garbage collection run can identify
and correct in the future.

Achieving 3 builds on the solution for 2, by having our readers
coordinate with this. I.e. for the weave format we must read either the
entire weave from a single file descriptor, or re-read the entire thing
if we scanned just a header previously. (Note that this has implications
for transports and partial reads - if the fd is not held open on the
remote end (which it isn't for any stateless transport such as http), we
cannot use partial reads *except* when the file format is designed such
that client updates wont invalidate ranges of data.

In short, to get 1,2 and 3:
At the domain level, we must not have a revision object visible to any
readers until the inventory that it references is available, and
likewise for the inventory and file texts. 
And at the file level we must have atomic visibility and fine grained
control on write orders.

Techniques like revision-blobs which contain all the data for a revision
help with both of these at once -  but they are neither necessary nor
sufficient. For instance for weaves, making something available
atomically can be done via a write + rename, for knits the file format
is designed so that partially written records are ignored.

Now, the types of transactions. A readonly one is probably easiest to
understand: we cache some accessed data in memory, and any of that
cached data is discardable and re-retrievable from disk on demand. We
trust that any writers will not remove data from the system, only append
to it (though the exact representation may change.) Note that writers
which change the expected types of representation permitted is likely to
result in errors from concurrent readers, unless we provide some means
to probe after an error to see if something that is known to cause such
weirdness is underway. I.e. upgrade is an example of this. Even semi
atomically upgrading the repository via an upgrade to (say) .bzr.new,
and then renames, will still confuse readers.

Passthrough transactions are a little more complex. These transactions
are current created implicity whenever a write-requiring action is
taken. The additional complexity is that a write lock is taken out but
only for short periods: its quite possible for new data to appear in the
repository between two calls to the bzrlib api on the same object.
Accordingly, no caching is performed by passthrough transactions: it
behaves like raw access to the data, except that writes are locked
appropriately. The ordering of serialisation to disk is precisely that
in which calls are made.

Lastly we have caching write transactions. These are transactions which
can be created to group together larger actions done with the library.
like readonly transactions referenced data is cached, but may be
discarded under memory pressure. Content is written to disk the same
order that serialisation requests are made, and unwritten content is
never discarded due to memory pressure without it being written [in
accordance with the above ordering rule]. One decision we need to make
is whether writes can be globbed together - i.e. if I ask for something
to be written that affects files X, then Y, then X again, can the system
write both setsof data to X, and then the one to Y? I think that this
violates the domain ordering constraints in the background section
above, and we should say that the transaction must not allow that to
happen.

Note that there is no discussion of 'rollback' in these cases - just
'failed transactions'. This is because removing data that has been made
visible will break the requirements for goal 3 above: I think that once
we design a means to alert clients to violations of that invariant, we
can build that into this as a reasonable extension.

One final point is that any api call in bzrlib should function correctly
regardless of the transaction type currently open - except of course
that write apis will fail or have to open a write transaction when
called during a readonly transaction.

So, what services does the transaction api need to offer to meet these
requirements? 

Well, at the core its needs a means to state that a given domain object
active in the transaction should be written. These domain objects could
be either things like revision objects, or larger representations like
entire weaves. Passthrough transactions will immediately act on
serialisation requests, caching write transactions would store them in
some sort of queue, which they can flush at the end of the transaction,
or during, depending on whatever performance tradeoffs we want to make
(I can imagine an async capable library flushing the queue in the
background as the transaction progresses. But thats for the future.)
This change to tell the transaction to write rather than the object to
write requires some api changes: regardless of whether we still call a
write on the object, and that then hooks into the transaction layer, or
whether we teach the transaction layer that objects are writable and
start calling the transaction to write everything.

Having thought this through in detail, I realise that register_dirty in
our transaction api is not-quite a request for a write of a domain
object. So -:

if we add 'request_write(object, writer)' or even just 'write(object,
writer)' then the existing api should be up to the task. I.e.:

    def write(self, an_object, writer):
        """Write an_object out using writer.

        If caching is present, the write request may be delayed an
        arbitrary time. If it is delayed and a rollback occurs, some
        or all pending writes may be aborted without notice.
        All writes are serialised and occur in the order given.

        Writes are performed by calling writer(an_object).
        """

So - please note that this is not a 'after the transaction do something'
api : I maintain that one of those is really not needed and not
appropriate at this point, but this should deliver all the desired
functionality for things like versionedfile, caching write transactions
and so on.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060122/3a60cdfa/attachment.pgp