Schema migration process

Fri Jun 6 01:18:59 UTC 2014

After some fruitful discussions, Tim and I have come up with something that
I think is starting to look pretty good. There's a significant change to
how we handle backups and rollbacks that seems like the right direction.
I've tried to capture it all in a Google Doc as this email thread is
starting to get impractical. Feel free to add comments and edit.

https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing

On 3 June 2014 13:34, Menno Smits <menno.smits at canonical.com> wrote:

> On 30 May 2014 01:47, John Meinel <john at arbash-meinel.com> wrote:
>
>>
>>
>>> Building on John's thoughts, and adding Tim's and mine, here's what I've
>>> got so far::
>>>
>>> - Introduce a "database-version" key into the EnvironConfig document
>>> which tracks the Juju version that the database schema matches. More on
>>> this later.
>>>
>>
>> For clarity, I would probably avoid putting this key into EnvironConfig,
>> but instead have it in a separate document. That also makes it easy to
>> watch for just this value changing.
>>
>
> SGTM. I've got no strong opinion on this.
>
>
>>
>> Potentially, I would decouple the value in this key from the actual agent
>> versions. Otherwise you do null DB schema upgrades on every minor release.
>> Maybe that's sane, but it *feels* like they are too separate issues. (what
>> is the version of the DB schema is orthogonal to what version of the code
>> I'm running.) It may be that the clarity and simplification of just one
>> version wins out.
>>
>
> I think it makes sense to just use the Juju version for the DB schema
> version. When you think about it, the DB schema is actually quite tightly
> coupled to the code version so why introduce another set of numbers to
> track? I'm thinking that if there's no schema upgrade steps required for a
> software given version then the DB is left alone except that the schema
> version number gets bumped.
>
>
>> - Introduce a MasterStateServer upgrade target which marks upgrade steps
>>> which are only to run on the master state server. Also more below.
>>>
>>
>> This is just a compiled-in list of steps to apply, right?
>>
>
> Yes. I was thinking that schema upgrade steps would be defined in the same
> place and way that other upgrade steps are currently defined so that they
> could even be interleaved with other kinds of upgrade steps.
>
> What I'm proposing here is that where we currently have 2 types of upgrade
> targets - AllMachines and StateServer - we introduce a third target called
> MasterStateServer which would be primarily (exclusively?) used for schema
> migration steps.
>
>
>>> - Non-master JobManageEnviron machine agents run their upgrade steps as
>>> usual and then watch for EnvironConfig changes. They don't consider the
>>> upgrade to be complete (and therefore let their other workers start) until
>>> database-version matches agent-version. This prevents the new version of
>>> the state server agents from running before the schema migrations for the
>>> new software version have run.
>>>
>>
>> I'm not sure if schema should be done before or after other upgrade
>> steps. Given we're really stopping the world here, it might be prudent to
>> just wait to do your upgrade steps until you know that the DB upgrade has
>> been done.
>>
>
> As mentioned above, with what I'm thinking there is no real distinction
> between schema migration steps and other types of upgrade steps so there's
> no concept of schema migrations happening before or after other upgrade
> steps.
>
>   *Observations/Questions/Issues*
>>
>>>
>>> - There are a lot of moving parts here. What could be made simpler?
>>>
>>> - What do we do if the master mongo database or host fails during the
>>> upgrade? Is it a goal for one of the other state servers take over and run
>>> the schema upgrades itself and let the upgrade finish? If so, is this a
>>> must-have up-front requirement or a nice-to-have?
>>>
>>
>> Some thoughts:
>>
>
>
>> 1. If the actual master mongo DB fails, that will cause reelection, which
>> should cause all of the servers to get their connections to Mongo bounced,
>> and then they'll notice that there is a new master who is responsible for
>> applying the database changes.
>>
>
>  We will have to do some testing to ensure that this scenario actually
> works. Maybe I'm over thinking it, but my gut says there's there's plenty
> to go wrong here.
>
> 2. If it is just the master Juju process that fails, I don't think there
>> is any great expectation that a different process running the same code is
>> going to succeed, is there?
>>
>
> Agreed.
>
>
>> 3. There is also a fair possibility that the schema migration we've
>> written won't work with real data in the wild. (we assumed this field was
>> never written, but suddenly it is, etc). We've talked about the ability to
>> have Upgrade roll back, and maybe we could consider that here. Some
>> possible steps are:
>>
>>
>>    1. Copy the db to another location
>>    2. Try to apply the schema updates (either in place or only to the
>>    backup)
>>    3. If upgrade fails, roll back to the old version, and update the
>>    AgentVersion in environ config so that the other agents will try to
>>    "upgrade" themselves back to the old version. This would also be a reason
>>    to do the DB schema before actually applying any other upgrade steps. We
>>    probably want some sort of "could not upgrade because of" tracking here, so
>>    that it can be reported to the user
>>
>>
> I like this and it should work as long as there's enough storage available
> to make a copy of the database. I'm not exactly clear on how we would
> revert to the backup instance if the migration fails but I'm sure this can
> be made to work. It might be enough for the first iteration if we initially
> make some kind of backup that the user has access to that they can restore
> from manually.
>
> As you mention, this would benefit from the DB schema steps being separate
> from the other upgrade steps. I have no real issue with this other than
> having them separate will probably mean more change to the existing
> upgrades package. This voids some of the things I've said earlier in this
> email :-)  I'll think some more about how this could look.
>
> 4. As long as we do some sort of "backup before applying the change" we
>> allow users a way to recover the system if something failed. If we have
>> proper Backup support integrated into core, one option is that we just
>> trigger a backup and then upgrade in place, if stuff breaks, we at least
>> have *something* that should be recoverable.
>>
>
> It's a pity that the full Backup feature isn't there yet as this could be
> a nice way to get a first version of schema migrations working quickly.
>
>>
>>
>>
>>> - Upgrade steps currently have access to State but I think this probably
>>> won't be sufficient to perform many types of schema migrations (i.e.
>>> accessing defunct fields, removing fields, adding indexes etc). Do we want
>>> to extend State to provide a number of schema migration helpers or do we
>>> expose mongo connections directly to the upgrade steps?
>>>
>>
>> I believe the existing Upgrade logic actually has access to the API not
>> to State itself, so we'll need something there. The State object has raw
>> mongo collections on it (environs, charms, etc).
>>
>
> The existing upgrade logic has access to both the API and State (the
> latter only on state machines obviously, that arg is nil otherwise) so
> that's already done.
>
>
>> DB Schema (IMO) inherently is going to be at the raw DB level, vs changes
>> in the abstract objects. (I expect that it will be defined in terms of
>> Apply this function to all entities in this collection, rather than iterate
>> over Machine objects and set data on them.)
>> I could be wrong, but it does seem like we'll want the syntax of db
>> schema changes to be on mgo.Collection objects, and not on State objects.
>>
>
> I completely agree that we need schema migrations to work in the mongodb
> world and not via application level objects. Some schema migration tasks
> just won't make sense at the application object level.
>
> State doesn't expose its mgo collections to the outside though so how
> would a schema migration step interact with them, especially for tasks such
> as adding new collections or indexes? Do we add a bunch of schema migration
> helper methods on to State (e.g. AddCollection(), AddIndex(),
> ApplyToCollection() etc) or do we add a single method which exposes the
> mongo database object (clearly marked as exclusively there for use by
> schema upgrade steps), or do we have schema migration steps pass a function
> that takes a mongo DB object to act on? We already expose the mongo session
> with MongoSession() so there is some precedent for this.
>
>
>>
>>> - There is a possibility that a non-master state server won't upgrade,
>>> blocking the master from completing the upgrade. Should there be a timeout
>>> before the master gives up on state servers upgrading themselves and
>>> performs its own upgrade steps anyway?
>>>
>>
>> Arguably this is a better case for "rollback" than "just move forward".
>>
>
> Ok - sounds good.
>
>
>>
>>
>>>
>>> - Given the order of documents a juju system stores, it's likely that
>>> the schema migration steps will be quite quick, even for a large
>>> installation.
>>>
>>>
>> "order of magnitude" right?
>>
>
> Yes - sorry that wasn't very clear.
>
>
>> Yeah, we're talking megabytes, GB being really large, not many GB of data.
>>
>
> Great.
>
> Thanks for the excellent feedback.
>
> - Menno
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140606/34082a06/attachment-0001.html>