Active Record Encryption


Jason Dinsmore - July 06, 2021

Rails 7 will be introducing a very cool new feature for ActiveRecord - application-level encryption, summoned by the mighty encrypts declaration in a model. This new feature provides a layer of encryption that sits between our application code and the database. In essence, when our data using Active Record Encryption has been loaded into an AR object, it will be unencrypted, and when it is sitting in the database it will be encrypted.

In this post, we will take a high-level look at how to use this functionality, discuss some cool things it can do, and acknowledge some of its limitations.

Before we dive in, I would be remiss if I didn't point to the excellent documentation in the Rails guide.

For the duration of the post, I will refer to the feature as encrypts for brevity (and also because I am not sure what else to call it 😉).

Backstory

encrypts was merged into Rails in PR 41659 by @jorgemanrubia. In the PR description he states that the functionality is an extraction from HEY, which had the implementation reviewed by a security firm. If you are interested in hearing more of the story behind the feature, I'd recommend Jorge's blog post on the subject - it's an interesting read.

Deterministic Sidebar

Let's take a quick second to discuss the difference between deterministic and non-deterministic encryption. It's really pretty simple, but is a central tenet in understanding how to use encrypts.

Think of encryption as a function that is applied to some input (text), resulting in some output (encrypted text).

If the function is deterministic, then any time the function is applied to the same text, it will output the same result.

If the encryption function is non-deterministic, then we can't predict what the output will be the second time we encrypt a given value. In theory, it is possible we would get the same result we got the first time, but the odds of that happening are very, very low. In the context of encrypts with its default non-deterministic configurations, subsequent encryptions of the same plaintext are almost certain to produce different ciphertexts.

If we are using deterministic encryption with encrypts for a model attribute, then any two rows in the database that would have the same plaintext value will also have the same stored encrypted value. If we using non-deterministic encryption, then two rows that have the same plaintext value will generally have different encrypted values. As we'll see in a bit, this has implications as to whether we can query the encrypted data.

Setup

There's not a ton of configuration required to get going with encrypts, but there are some things to be aware of.

Keys

The main requisite is that we will need to generate a set of keys and add them to our credentials file(s). We can generate the keys to add by running: bin/rails db:encryption:init, which will output something like:

Add this entry to the credentials of the target environment: active_record_encryption: primary_key: zxMXS0hBbpa5BzRKPv9HOSF9etBySiHQ deterministic_key: 0pM2UHHBQr1kf1irO6JgakcSOXu0r1Vn key_derivation_salt: I5AkViD0UJhSqK3NY49Zvsls3ZoifyXx

The primary key is used to derive the root encryption key for non-deterministic encryption. Note that the primary_key value in the credentials file can also be a list of keys.

The deterministic_key is used for deterministic encryption. If you recall from the section on determinism above, we'll get the same result if we encrypt the same data with this key multiple times. Currently, encrypts does not support using a list of keys for deterministic encryption. If we want to completely disable deterministic encryption, not providing the key is a sure-fire off switch.

The key_derivation_salt is used to derive encryption keys.

App Configuration

The encrypts API provides several configuration options. All of the options are defined in the config.active_record.encryption namespace and I'd encourage reading through them if you are going to use this feature. I believe you'll find that most of the options have pretty reasonable defaults.

I will mention the config.active_record.encryption.extend_queries option as it defaults to false, but enabling it has several implications:

  • enables querying unencrypted data in an encrypted column (also need to enable config.active_record.encryption.support_unencrypted_data for this)
  • allows supporting multiple encryption schemes
  • enables support for uniqueness validations

Database

When an encrypted string or text attribute is stored in the database, it isn't stored as an ordinary string or text - it is stored as a more complex data structure that is serialized when written and deserialized when read. This data structure allows some meta information to be stored along with the encrypted text, which gives the app some clues about how the text was encrypted.

This extra meta-info introduces some storage overhead - up to 250 bytes.

The guide recommends increasing the size of a 255 byte string field to 512 bytes if using encryption on a column. It also says that for a text field, the overhead is generally negligible.

Dog burying a bone

Invocation

Finally, we can talk about actually using the thing!

In the most basic use case, to encrypt a single column, we simply add an encrypts declaration to our model for the attribute we want to encrypt. For example, if we had a Dog model with a toy_location field (dogs like to hide their toys, you know) that needs encryption, our model would look like:

 class Dog < ApplicationRecord encrypts :toy_location

Pretty simple, eh?

Writing

Writing an encrypted attribute is completely transparent. We just do what we would normally do in Rails:

> dog = Dog.create!(name: 'Bruno', toy_location: 'top secret')

If we were to look at the content sitting in the database directly, we would see something like:

> result = Dog.connection.execute('SELECT toy_location FROM dogs LIMIT 1').first (1.4ms) SELECT toy_location FROM dogs LIMIT 1
=> {"toy_location"=>"{\"p\":\"oVgEJvRaX6DJvA==\",\"h\":{\"iv\":\"WYypcKysgBY05Tum\",\"at\":\"OaBswq+wyriuRQO8yCVD3w==\"}}"}

The value here is just serialized JSON, let's go ahead and parse it:

> JSON.parse(result['toy_location'])
=> {"p"=>"oVgEJvRaX6DJvA==", "h"=>{"iv"=>"WYypcKysgBY05Tum", "at"=>"OaBswq+wyriuRQO8yCVD3w=="}}

That gave us a Hash. Most of the keys in this Hash are defined in the ActiveRecord::Encryption::Properties::DEFAULT_PROPERTIES constant. p is the payload, aka the encrypted plaintext. h is a Hash of headers that contain information relating to the encryption operation. Here, iv is the initialization vector the plaintext was encrypted with - more about this in the next section on searching, and at is an auth_tag that will be used during the decryption process to verify that the encrypted text hasn't been altered. You may notice other headers from the DEFAULT_PROPERTIES Hash above depending on how your encryption is set up and being used.

Reading

When we load a model with an encrypted attribute, Rails will seamlessly decrypt the encrypted value. Let's find the Dog we created above by his name:

> Dog.find_by!(name: 'Bruno').toy_location
=> <Dog id: 1, name: "Bruno", toy_location: "top secret", created_at: "2021-05-28 22:41:23.142635000 +0000", updated_at: "2021-05-28 22:41:23.142635000 +0000">

As you can see, the encrypted value was automatically translated to a readable attribute on our model instance - pretty slick.

Searching

What if we wanted to find Bruno by his toy_location instead of his name? We can do that just like we would if the field were not encrypted:

> dog = Dog.find_by!(toy_location: 'top secret') Dog Load (2.1ms) SELECT "dogs".* FROM "dogs" WHERE "dogs"."toy_location" = ? LIMIT ? [["toy_location", "{\"p\":\"oVgEJvRaX6DJvA==\",\"h\":{\"iv\":\"WYypcKysgBY05Tum\",\"at\":\"OaBswq+wyriuRQO8yCVD3w==\"}}"], ["LIMIT", 1]]
=> 

Notice that our query string was automatically converted into the encrypted JSON string we saw when we looked in the database.

Initialization Vector/Determinism

When using deterministic encryption, all records with the same plaintext value will use the same initialization vector to encrypt. This is so ActiveRecord will generate the same ciphertext for the same input, which is a prerequisite for being able to search the encrypted data. Under the hood, Rails uses the plaintext to generate the initialization vector for deterministically encrypted data - otherwise the IV is randomly generated.

If two rows with the same plaintext were to use different initialization vectors to perform the encryption, the serialized JSON that ends up in the database would be completely different.

In order to be able to perform searches on the encrypted data, the stored values need to be exactly the same.

This means that all of the stored values in the serialized hash need to be identical for two rows that have the same text value, AND that Rails can re-compute the exact same hash on the fly to find rows that are matches for the search string.

Determinism at its finest.

Searching Plaintext

What if we did not have the luxury of starting with pristine data? For example, if our Dog table already existed and had a pre-existing toy_location column on it that was not encrypted?

Well, as we can see by the query generated above, if we had a Dog record with top secret (unencrypted) as its toy_location, that query aint gonna find it. Also, it seems pretty likely that if we try to load a Dog record with stored plaintext into memory, Rails is going to have problems when it attempts to decrypt the plaintext.

One option we have would be to convert our plaintext data to encrypted data, which seems ideal to me. We may have reasons to want to avoid doing a data migration like that, however.

In that case, encrypts will allow us to keep storing the plaintext values as plaintext, and will encrypt any new or updated data. To opt-in to supporting a mix of encrypted/unencrypted data, enable the config.active_record.encryption.support_unencrypted_data configuration option.

Enabling this behavior will prevent the errors we would get when it tries to decrypt the plaintext and will also allow us to perform searches when a column contains a mismatch of plaintext and encrypted data.

If we enable the setting and re-run our search query above, we'll see:

Dog Load (0.3ms) SELECT "dogs".* FROM "dogs" WHERE "dogs"."toy_location" IN (?, ?) LIMIT ? [["toy_location", "{\"p\":\"Bd+/TzEysF2CCQ==\",\"h\":{\"iv\":\"R2IUJJ+EmnDnZvQP\",\"at\":\"zqG5WAJql1zgctRCPpoBkQ==\"}}"], ["toy_location", "top secret"], ["LIMIT", 1]]

Now, it is looking for records that have the encrypted content or have the plaintext version of that content. Perfect!

Case Insensitive Searches

By default, searching is case sensitive. If we need to ignore case when searching for some reason, we have a few options:

Option 1:

We can query for all of the case variations we need to match - eg. Dog.where(toy_location: ['Top secret', 'top secret']).

Option 2:

We can specify downcase: true on our encrypts declaration. This will cause the text to be downcased before it is stored. ActiveRecord will automatically downcase our search text when performing queries. The downside here is that all case information is lost when it is downcased. Sorry to be a downer.

Option 3:

We can specify ignore_case: true on the encrypts declaration and add an original_column_name column to our database (eg. original_toy_location). With this in place, if we created a dog with an uppercase letter in the encrypted field:

Dog.create!(name: 'Max', toy_location: 'Top secret')

the toy_location column would be populated with the encrypted form of 'top secret' (the value downcased), and the original_toy_location column will have the encrypted form of 'Top secret' (the value we set).

Any searches would be done against the toy_location column, and the model's toy_location attribute would be populated from the original_toy_location column when it is loaded into memory.

One thing to note here - while the toy_location column is encrypted deterministically in this situation (so it can be searched), the original_toy_location column appears to be encrypted non-deterministically. This makes sense, since that column does not need to support searching. This can be confirmed by comparing the toy_location and original_toy_location values for two records with the same plaintext value. As you can see below, they have the same stored values (initialization vector, payload, etc) for the toy_location column (searchable and downcased), and different stored values for the original_toy_location column (not searchable, case preserved):

{ "toy_location" => "{\"p\":\"Bd+/TzEysF2CCQ==\",\"h\":{\"iv\":\"R2IUJJ+EmnDnZvQP\",\"at\":\"zqG5WAJql1zgctRCPpoBkQ==\"}}", "original_toy_location" => "{\"p\":\"5syLqDK6GCbBDw==\",\"h\":{\"iv\":\"KBGp4FrI7oL4/a3p\",\"at\":\"JnH6hxLX35cAwroImk2XqQ==\"}}" }, "toy_location" => "{\"p\":\"Bd+/TzEysF2CCQ==\",\"h\":{\"iv\":\"R2IUJJ+EmnDnZvQP\",\"at\":\"zqG5WAJql1zgctRCPpoBkQ==\"}}", "original_toy_location" => "{\"p\":\"0246w4+SSqqlJw==\",\"h\":{\"iv\":\"1uEnjlCNot9sYNgR\",\"at\":\"UhkhK6YlOTxJg75juqIMGA==\"}}" }

Other Cool things

encrypts does even more than we have looked at so far. In the interest of not writing a book, I'm not going to go into much detail here, but would like to mention some of its other capabilities.

We have only looked at encrypting simple strings, but encrypts can encrypt rich text attributes too.

It also provides support for previous encryption schemes. This means that we can start out using non-deterministic encryption on a column and change to using deterministic encryption later on. I would definitly recommend reading the fine print before using this feature.

We can rotate our (non-deterministic) keys. This is pretty cool, just note that it is not currently supported for deterministic encryption.

Related to the rotating keys, we can configure encrypts to store a reference to the key used to encrypt in the encrypted data itself.

If using deterministic encryption, encrypts supports using unique constraints. If we need to ensure uniqueness in any of our encrypted columns, there are some things to be aware of. Be sure to read up on it first.

Encrypted columns are automatically filtered from Rails logs by default. encrypts provides a way to disable this functionality.

This might be a good place to mention that the implementation is modular, and allows quite a bit of customization. Many of encrypts options can be set on a per-attribute basis, or at more global levels.

Limitations

Even with all of its glory, encrypts does have limitations. We'll take a look at a few that stood out to me. Given the variety of applicable use cases and the breadth of the functionality, I am sure other folks will have their own list.

  • Fuzzy searching - The search functionality encrypts provides requires an exact match on search text. This means doing a LIKE query, for example, won't work. It also means that any queries done on encrypted columns will need to go through Rails and ActiveRecord vs being manually crafted in SQL.
  • Rich text search - While it is rad that encrypts can encrypt rich text, it can only do so non-deterministically right now. This means that we won't be able to search rich text.
  • Deterministic searching does not support multiple keys - Something good to be aware of going in - if using deterministic encryption/searching, we won't have the ability to use more than one key at a time. If we need to change keys, we'll likely need to do something fancy.
  • Rails console exposes data - This may seem obvious, but if a malicious person gets access to our Rails console, they can load encrypted data into objects and view the plaintext all day long. In Jorge's post, he mentioned that HEY is using a console extension that sits on top of the encryption feature that protects and audits console access. Unfortunately, that is a private gem (named console1984) and not available in Rails (at this time).
  • Deterministic encryption reduces security - I don't think this is a fault of the implementation per se, but if we use deterministic encryption, then any two rows that have the same value for an encrypted attribute will have the same value stored in the database. While we can't necessarily reverse engineer how it was encrypted, if we know what one of the row's plaintext value is, then we know what the other row's plaintext value is as well. Non-deterministic encryption doesn't have this same weakness.

Summary

TLDR: I am pretty excited and intrigued by this feature. I think it will be cool to see how people use it and how it evolves over time. My hunch is that some of the current limitations will go away (like not supporting multiple keys for deterministically encrypted attributes) as more people begin using it and digging through the code.