Schema migration process

Thu Jun 12 21:31:29 UTC 2014

You go to sleep for one night and they change everything ... :)

I've just caught up with the new backup system proposal. This changes the
schema migrations design a bit - I'll update the doc.

On 12 June 2014 21:33, John Meinel <john at arbash-meinel.com> wrote:

> If I read the conversations on IRC, they were talking about changing the
> backup to be just a POST to an HTTP endpoint, and you get back the contents
> of the DB, which would be deleted when the backup completes. Though you
> could still probably use whatever internal helpers spool the data to a temp
> location to do the same for backup & restore.
>
>
> On Thu, Jun 12, 2014 at 8:40 AM, Menno Smits <menno.smits at canonical.com>
> wrote:
>
>> I've updated the schema migration document with the ideas that have come
>> up in recent discussions.The scope of the schema migrations work has been
>> reduced somewhat by making the upgrade step Apply/Rollback concept a
>> separate project (database changes can be rolled back through the use of
>> mongobackup/restore).
>>
>> I've raised a few issues in the comments about handling various failure
>> modes. Input would be greatly appreciated.
>>
>> Nate: it would be good for you to have a look at this because we're
>> planning on leaning on the new backup functionality quite a bit. Let me
>> know if anything I'm proposing isn't compatible with what your team is
>> working on.
>>
>>
>> https://docs.google.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>>
>> On 6 June 2014 13:18, Menno Smits <menno.smits at canonical.com> wrote:
>>
>>> After some fruitful discussions, Tim and I have come up with something
>>> that I think is starting to look pretty good. There's a significant change
>>> to how we handle backups and rollbacks that seems like the right direction.
>>> I've tried to capture it all in a Google Doc as this email thread is
>>> starting to get impractical. Feel free to add comments and edit.
>>>
>>>
>>> https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>>>
>>>
>>> On 3 June 2014 13:34, Menno Smits <menno.smits at canonical.com> wrote:
>>>
>>>> On 30 May 2014 01:47, John Meinel <john at arbash-meinel.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>> Building on John's thoughts, and adding Tim's and mine, here's what
>>>>>> I've got so far::
>>>>>>
>>>>>> - Introduce a "database-version" key into the EnvironConfig document
>>>>>> which tracks the Juju version that the database schema matches. More on
>>>>>> this later.
>>>>>>
>>>>>
>>>>> For clarity, I would probably avoid putting this key into
>>>>> EnvironConfig, but instead have it in a separate document. That also makes
>>>>> it easy to watch for just this value changing.
>>>>>
>>>>
>>>> SGTM. I've got no strong opinion on this.
>>>>
>>>>
>>>>>
>>>>> Potentially, I would decouple the value in this key from the actual
>>>>> agent versions. Otherwise you do null DB schema upgrades on every minor
>>>>> release. Maybe that's sane, but it *feels* like they are too separate
>>>>> issues. (what is the version of the DB schema is orthogonal to what version
>>>>> of the code I'm running.) It may be that the clarity and simplification of
>>>>> just one version wins out.
>>>>>
>>>>
>>>> I think it makes sense to just use the Juju version for the DB schema
>>>> version. When you think about it, the DB schema is actually quite tightly
>>>> coupled to the code version so why introduce another set of numbers to
>>>> track? I'm thinking that if there's no schema upgrade steps required for a
>>>> software given version then the DB is left alone except that the schema
>>>> version number gets bumped.
>>>>
>>>>
>>>>> - Introduce a MasterStateServer upgrade target which marks upgrade
>>>>>> steps which are only to run on the master state server. Also more below.
>>>>>>
>>>>>
>>>>> This is just a compiled-in list of steps to apply, right?
>>>>>
>>>>
>>>> Yes. I was thinking that schema upgrade steps would be defined in the
>>>> same place and way that other upgrade steps are currently defined so that
>>>> they could even be interleaved with other kinds of upgrade steps.
>>>>
>>>> What I'm proposing here is that where we currently have 2 types of
>>>> upgrade targets - AllMachines and StateServer - we introduce a third target
>>>> called MasterStateServer which would be primarily (exclusively?) used for
>>>> schema migration steps.
>>>>
>>>>
>>>>>> - Non-master JobManageEnviron machine agents run their upgrade steps
>>>>>> as usual and then watch for EnvironConfig changes. They don't consider the
>>>>>> upgrade to be complete (and therefore let their other workers start) until
>>>>>> database-version matches agent-version. This prevents the new version of
>>>>>> the state server agents from running before the schema migrations for the
>>>>>> new software version have run.
>>>>>>
>>>>>
>>>>> I'm not sure if schema should be done before or after other upgrade
>>>>> steps. Given we're really stopping the world here, it might be prudent to
>>>>> just wait to do your upgrade steps until you know that the DB upgrade has
>>>>> been done.
>>>>>
>>>>
>>>> As mentioned above, with what I'm thinking there is no real distinction
>>>> between schema migration steps and other types of upgrade steps so there's
>>>> no concept of schema migrations happening before or after other upgrade
>>>> steps.
>>>>
>>>>   *Observations/Questions/Issues*
>>>>>
>>>>>>
>>>>>> - There are a lot of moving parts here. What could be made simpler?
>>>>>>
>>>>>> - What do we do if the master mongo database or host fails during the
>>>>>> upgrade? Is it a goal for one of the other state servers take over and run
>>>>>> the schema upgrades itself and let the upgrade finish? If so, is this a
>>>>>> must-have up-front requirement or a nice-to-have?
>>>>>>
>>>>>
>>>>> Some thoughts:
>>>>>
>>>>
>>>>
>>>>> 1. If the actual master mongo DB fails, that will cause reelection,
>>>>> which should cause all of the servers to get their connections to Mongo
>>>>> bounced, and then they'll notice that there is a new master who is
>>>>> responsible for applying the database changes.
>>>>>
>>>>
>>>>  We will have to do some testing to ensure that this scenario actually
>>>> works. Maybe I'm over thinking it, but my gut says there's there's plenty
>>>> to go wrong here.
>>>>
>>>> 2. If it is just the master Juju process that fails, I don't think
>>>>> there is any great expectation that a different process running the same
>>>>> code is going to succeed, is there?
>>>>>
>>>>
>>>> Agreed.
>>>>
>>>>
>>>>> 3. There is also a fair possibility that the schema migration we've
>>>>> written won't work with real data in the wild. (we assumed this field was
>>>>> never written, but suddenly it is, etc). We've talked about the ability to
>>>>> have Upgrade roll back, and maybe we could consider that here. Some
>>>>> possible steps are:
>>>>>
>>>>>
>>>>>    1. Copy the db to another location
>>>>>    2. Try to apply the schema updates (either in place or only to the
>>>>>    backup)
>>>>>    3. If upgrade fails, roll back to the old version, and update the
>>>>>    AgentVersion in environ config so that the other agents will try to
>>>>>    "upgrade" themselves back to the old version. This would also be a reason
>>>>>    to do the DB schema before actually applying any other upgrade steps. We
>>>>>    probably want some sort of "could not upgrade because of" tracking here, so
>>>>>    that it can be reported to the user
>>>>>
>>>>>
>>>> I like this and it should work as long as there's enough storage
>>>> available to make a copy of the database. I'm not exactly clear on how we
>>>> would revert to the backup instance if the migration fails but I'm sure
>>>> this can be made to work. It might be enough for the first iteration if we
>>>> initially make some kind of backup that the user has access to that they
>>>> can restore from manually.
>>>>
>>>> As you mention, this would benefit from the DB schema steps being
>>>> separate from the other upgrade steps. I have no real issue with this other
>>>> than having them separate will probably mean more change to the existing
>>>> upgrades package. This voids some of the things I've said earlier in this
>>>> email :-)  I'll think some more about how this could look.
>>>>
>>>> 4. As long as we do some sort of "backup before applying the change" we
>>>>> allow users a way to recover the system if something failed. If we have
>>>>> proper Backup support integrated into core, one option is that we just
>>>>> trigger a backup and then upgrade in place, if stuff breaks, we at least
>>>>> have *something* that should be recoverable.
>>>>>
>>>>
>>>> It's a pity that the full Backup feature isn't there yet as this could
>>>> be a nice way to get a first version of schema migrations working quickly.
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> - Upgrade steps currently have access to State but I think this
>>>>>> probably won't be sufficient to perform many types of schema migrations
>>>>>> (i.e. accessing defunct fields, removing fields, adding indexes etc). Do we
>>>>>> want to extend State to provide a number of schema migration helpers or do
>>>>>> we expose mongo connections directly to the upgrade steps?
>>>>>>
>>>>>
>>>>> I believe the existing Upgrade logic actually has access to the API
>>>>> not to State itself, so we'll need something there. The State object has
>>>>> raw mongo collections on it (environs, charms, etc).
>>>>>
>>>>
>>>> The existing upgrade logic has access to both the API and State (the
>>>> latter only on state machines obviously, that arg is nil otherwise) so
>>>> that's already done.
>>>>
>>>>
>>>>> DB Schema (IMO) inherently is going to be at the raw DB level, vs
>>>>> changes in the abstract objects. (I expect that it will be defined in terms
>>>>> of Apply this function to all entities in this collection, rather than
>>>>> iterate over Machine objects and set data on them.)
>>>>> I could be wrong, but it does seem like we'll want the syntax of db
>>>>> schema changes to be on mgo.Collection objects, and not on State objects.
>>>>>
>>>>
>>>> I completely agree that we need schema migrations to work in the
>>>> mongodb world and not via application level objects. Some schema migration
>>>> tasks just won't make sense at the application object level.
>>>>
>>>> State doesn't expose its mgo collections to the outside though so how
>>>> would a schema migration step interact with them, especially for tasks such
>>>> as adding new collections or indexes? Do we add a bunch of schema migration
>>>> helper methods on to State (e.g. AddCollection(), AddIndex(),
>>>> ApplyToCollection() etc) or do we add a single method which exposes the
>>>> mongo database object (clearly marked as exclusively there for use by
>>>> schema upgrade steps), or do we have schema migration steps pass a function
>>>> that takes a mongo DB object to act on? We already expose the mongo session
>>>> with MongoSession() so there is some precedent for this.
>>>>
>>>>
>>>>>
>>>>>> - There is a possibility that a non-master state server won't
>>>>>> upgrade, blocking the master from completing the upgrade. Should there be a
>>>>>> timeout before the master gives up on state servers upgrading themselves
>>>>>> and performs its own upgrade steps anyway?
>>>>>>
>>>>>
>>>>> Arguably this is a better case for "rollback" than "just move forward".
>>>>>
>>>>
>>>> Ok - sounds good.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> - Given the order of documents a juju system stores, it's likely that
>>>>>> the schema migration steps will be quite quick, even for a large
>>>>>> installation.
>>>>>>
>>>>>>
>>>>> "order of magnitude" right?
>>>>>
>>>>
>>>> Yes - sorry that wasn't very clear.
>>>>
>>>>
>>>>> Yeah, we're talking megabytes, GB being really large, not many GB of
>>>>> data.
>>>>>
>>>>
>>>> Great.
>>>>
>>>> Thanks for the excellent feedback.
>>>>
>>>> - Menno
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140613/0a4ea58e/attachment-0001.html>