Zero restart migration #33

ribose-jeffreylau · 2017-09-29T10:28:17Z

Goal

Seamless transition from one encryption scheme to another.
Preferrably not to have to change Model code.

Implies

No generated artifacts (e.g. .rb files) should break Rails.
DB migrations should not break running Rails server.
No need to modify Model code to add / rename / remove attr_encrypted
columns.
No restart of Rails server should be necessary, except for the first time
when set-up code is deployed.
Encryption configuration (encryption schema) changes are stored away from
Model definitions.
Perhaps in a config file, or even another DB table.

Potential tools

Instead of running rake tasks to run transcryptions, rails-data-migrations
seems like a good alternative. It stores its migration history in the table
data_migrations.

DB

Currently,

DB schema: db/schema.rb
DB migrations: db/migration/xxx_.rb
model encryption configs: app/models/xxx.rb

Column

Some ideas on encryption schema:

one config item per encrypted column

db/encryption_schema.yaml

---yaml
- class_name: User
  - column: ssn
    - config: { algo, salt, etc. }

migration:

transcrypt old config to new config
new config linked to schema?

db/transcryption/2017125_transcrypt_user_ssn.rb

class TranscryptUserSSN < Transcryption
  def old
    { algo, salt, etc. }
  end

  def new
    #
    # { algo, salt, etc. }
    #
    # or perhaps something like:
    #
    Transcryptor.schema(:user, :ssn)
  end
end

config:

define columns to encrypt
encryption configs are read from Transcryptor.schema(:user, :ssn)

app/models/user.rb

class User < AR
  attr_transcrypted :ssn
end

How to make this work? What else needs to be defined? What's the closest we can acheive?

cc: @DmitryDrobotov

The text was updated successfully, but these errors were encountered:

ronaldtse · 2017-10-02T09:55:48Z

@DmitryDrobotov would you have time to start with this? Thanks!

DmitryDrobotov · 2017-10-05T10:47:26Z

@ribose-jeffreylau @ronaldtse I've spent yesterday in my thoughts how to implement so magic behavior and here is some point I realized.

Database. To have a zero-downtime migration we exactly need to create separate columns OR have a column which indicates that encrypted attribute was re-encrypted OR create identical table and fill it with all data (anyway, too hard and heavy to sync all data through all time of migration)
Model configuration. It is possible to store configuration somewhere else, but what should we do with pros or lambdas which can participate in configuration?
During migration and at the end of it we need to replace old configuration with new one. How are you going to handle it?
Transcryptor.schema(:user, :ssn) won't give us an versioning of data changes.

ribose-jeffreylau · 2017-10-10T04:22:30Z

@DmitryDrobotov

For storing things like Procs in configuration, here's how I envision configurations could work:

Each individual column has separate migration definition files.
Each migration file is labelled with an ordered ID, pretty much like how db/migrations are labelled.
Each migration file contains both the then-current config (version n) and the then-previous config (version n - 1).

Thus, any running server instances would use the latest migration version (version n) as the current config, and also read in the earliest unmigrated version (version n 1) if exists.

A background rake task could then run migration version n 1. The running server instance's DB would eventually pick up the new config via solution in point 1 (TBD).

Would having a column indicating the migration version for a particular column work better / be easier?

For a running server to pick up the latest config, the scheme detailed in point 2. could perhaps address this.

So, in summary, the transcryption scheme now looks like:

db/transcryption/20171010123456_transcrypt_user_ssn.rb

class TranscryptUserSSN < Transcryption
  def old
    { algo, salt, etc. }
  end

  def new
    { algo, salt, etc. }
  end
end

and no more yaml files.

Hence, the configuration is implicitly the "head" of a chain of migration files. The "head" can be different for each row, indicated by a separate column (for each encrypted column) via the transcryption order ID (e.g. 20171010123456). The labels are usually different only during a transcryption, otherwise they should all point to the same version.

ronaldtse · 2017-10-10T04:35:04Z

I was actually thinking of a less drastic solution like this, so that the latest configuration is still located in the model, but the old one that we want to get rid of (in time) will be stored in the data migration, and still available if any server instances are still running on old code.

db/transcryption/20171010123456_transcrypt_user_ssn.rb

class TranscryptUserSSN < Transcryption
  def old
    { algo, salt, etc. }
  end

  def new
    UserModel.new_ssn_configuration
  end

  UserModel.register_old_configuration(:ssn, &old)
end

DmitryDrobotov · 2017-10-10T18:35:39Z

@ribose-jeffreylau @ronaldtse thanks guys! Now I see the picture how to implement that. We will need:

Add column with version to table which has encrypted attribute (let's say it's name is ssn), then version column will have encrypted_ssn_version:decimal column and accepts values like 20171010123456
Create first migration file under db/transcryption/20171010123456_transcrypt_user_ssn.rb (we can develop rake task to generate such files)
Rails app fetches old config from migration file and reencrypts attribute with new configuration and updates encrypted_ssn_version column with the latest value.

Sounds great to me! What about you?

ronaldtse · 2017-10-11T00:00:57Z

@DmitryDrobotov this is actually a great solution!

This is actually something more generic than just for encrypted attributes (which may or may not be in a separate gem). This suggests that we have a versioned schema per column cell (i.e., each column cell will have a schema version).

For each schema-ed attribute column we will have an extra column to store the attribute schema version.

The flow works like this:

Code is updated to use the new column schema (20171011000000) with the new model code, with link to the old column schema (20171010000000) in the migration file.
Assume that all Rails instances are already updated to the new code. On read of the attribute, depending on the attribute schema, we can load the value according to old or new scheme. On write of the attribute, the value will be migrated to the new scheme.
This way we support "lazy-migrate-on-write" or have a separate background process to migrate attributes to the new column schema with zero-downtime.

What do you think?

DmitryDrobotov · 2017-10-11T07:55:29Z

@ronaldtse Sounds good, that is the same what I meant except generic way like rails-data-migrations do. Let's think about it a little bit, do we really need to implement generic solution and add support in transcryptor gem? Or simply implement transcription migrations inside this gem?

ronaldtse · 2017-10-11T08:00:49Z

My concern was purely about scope -- if the test suite should test only against generic attribute schema it probably should be in/from a separate library.

Do you think these gems are useful for this functionality?

ribose-jeffreylau · 2017-11-15T04:23:33Z

Hi @DmitryDrobotov . What else do you think is needed before we can continue?

DmitryDrobotov · 2017-12-05T20:30:26Z

@ronaldtse I don't think that these gems can be useful. Because we need zero-downtime migrations of dynamicaly calculated value. For example, if row with id=1 has already been migrated it should use new configuration of attr_encrypted, but for row with id=1000, which has not migrated yet, we should still use previous configuration for re-encryption. So, I am going to implement something similar, but with some big changes, such as migration version control for separate row (not version of migration, like ActiveRecord migrations work).

nattfodd · 2018-03-28T13:20:39Z

Hey there! The only difference between zero downtime approach suggested in master branch and this one is that we have those few extra lines of code in the migrated model, which we want to get rid of now? Do I understand it right?

And what is the difference between this issue and that one: #24 ?

CC @ribose-jeffreylau

ronaldtse · 2018-03-29T03:14:13Z

@nattfodd

The major difference with #24 :

Zero restart is for migrating each data cell (not a table/column) individually. The table and data already exists. The migration happens during production operation of the server and does not need to do any table modifications.
There is no need for post operations on the data schema once the data migration is completed.

ribose-jeffreylau · 2018-03-29T04:08:06Z

Hi @nattfodd !

The mechanism detailed in this issue focuses on data migration. It tries not to touch db/migrations/ at all, as db/migrations/ also has other concerns such as database schema migrations.

#24 concerns the bootstrapping of non-encrypted columns into encrypted columns, and vice versa, which will most probably change the DB schema.

Hope it makes things clearer :)

nattfodd · 2018-03-29T12:46:18Z

@ribose-jeffreylau sounds clearer, indeed. But I believe it should be extra temporary column anyway, as @DmitryDrobotov suggested right above. It may be done in a migration script though, something like:

# bundle exec rake transcryptor:nodowntime:migrate

add_column :users, :ssn_migrated, :boolean
# do some migrations
remove_column :users, :ssn_migrated

It will affect actual SQL DB schema, but it wont affect db/schema.rb and db/migrate/*.rb files...

ronaldtse · 2018-03-29T13:27:48Z

@nattfodd actually the solution discussed with @DmitryDrobotov is not a temporary column, but a permanent column.

The problem with a temporary column is that the database schema will be in flux -- for example, if multiple servers are reading values but suddenly the column is dropped (especially in many databases a column drop is implemented as a table adjust), it will cause havoc.

A straightforward solution is to have a separate, permanent column to manage "data cell schema", the data of which indicates the "data schema version" of the original cell. This is similar to putting the Rails schema version number in a more granular way, instead of a single version for the entire database, we have versions per value.

The benefits of a permanent column (or columns) are great:

The schema is stable. We don't need to care about adding and removing it.
There are no "post-migration" or "pre-migration" operational actions, meaning that no one needs to "run" anything one-off.
Allows for lazy migrations, meaning that the cell/value is only migrated when it is actually read.
Allows for parallel migrations, just need to start more migration workers to migrate the remaining data.

The cost of using this is the implementation of a data layer that can switch (read/write) between data schema versions:

On read: read the value, determine if it is necessary to upgrade the schema. If so, calculate the new value and write the value using the new schema, update the schema version.
On write: write according to the new schema, update the schema version.

ribose-jeffreylau added enhancement help wanted labels Sep 29, 2017

ribose-jeffreylau added this to the 2.0 milestone Sep 29, 2017

nattfodd mentioned this issue Apr 8, 2018

Zero downtime 2.0 #35

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero restart migration #33

Zero restart migration #33

ribose-jeffreylau commented Sep 29, 2017

ronaldtse commented Oct 2, 2017

DmitryDrobotov commented Oct 5, 2017

ribose-jeffreylau commented Oct 10, 2017

ronaldtse commented Oct 10, 2017

DmitryDrobotov commented Oct 10, 2017

ronaldtse commented Oct 11, 2017 •

edited

Loading

DmitryDrobotov commented Oct 11, 2017

ronaldtse commented Oct 11, 2017

ribose-jeffreylau commented Nov 15, 2017

DmitryDrobotov commented Dec 5, 2017

nattfodd commented Mar 28, 2018 •

edited

Loading

ronaldtse commented Mar 29, 2018

ribose-jeffreylau commented Mar 29, 2018

nattfodd commented Mar 29, 2018

ronaldtse commented Mar 29, 2018

Zero restart migration #33

Zero restart migration #33

Comments

ribose-jeffreylau commented Sep 29, 2017

Goal

Implies

Potential tools

DB

Column

ronaldtse commented Oct 2, 2017

DmitryDrobotov commented Oct 5, 2017

ribose-jeffreylau commented Oct 10, 2017

ronaldtse commented Oct 10, 2017

DmitryDrobotov commented Oct 10, 2017

ronaldtse commented Oct 11, 2017 • edited Loading

DmitryDrobotov commented Oct 11, 2017

ronaldtse commented Oct 11, 2017

ribose-jeffreylau commented Nov 15, 2017

DmitryDrobotov commented Dec 5, 2017

nattfodd commented Mar 28, 2018 • edited Loading

ronaldtse commented Mar 29, 2018

ribose-jeffreylau commented Mar 29, 2018

nattfodd commented Mar 29, 2018

ronaldtse commented Mar 29, 2018

ronaldtse commented Oct 11, 2017 •

edited

Loading

nattfodd commented Mar 28, 2018 •

edited

Loading