Data-at-rest encryption (also known as transparent data encryption or TDE) is a necessary mechanism for ensuring the security of a DBMS deployment. Upcoming releases of Percona Server for MongoDB extend that mechanism with the KMIP key state polling feature. In this technical post, I will describe how the feature works and how it helps reduce the mean time to resolve (MTTR) compromised encryption key incidents. The post also provides detailed upgrade instructions.
If you’re getting the Master encryption key is not in the active state on the key management facility error after an upgrade to version 8.0.0 and just want to proceed without getting into all the details, please skip to the Upgrading to version 8.0.0 … section.
Introduction
Percona Server for MongoDB implements data-at-rest encryption by encrypting each database on a disk with a separate, randomly generated encryption key. By “database,” I mean a database in MongoDB’s terms, i.e., a group of collections. Percona Server for MongoDB puts all the per-database keys into the so-called key database and encrypts the latter with a master encryption key. Each node of Percona Server for MongoDB randomly generates a unique master encryption key that should be stored externally, for instance, on a KMIP server. Moreover, a master encryption key must be regularly changed (aka “rotated”) to minimize the damage in case it is compromised. The approach described above enables a quick master key rotation: the data doesn’t need to be re-encrypted; only a key database is re-encrypted with a new master key. The latter is almost always a little work due to the small size of a key database.
Let’s consider a situation in which a master encryption key was compromised long before the planned rotation. In a deployment with replication and sharding, there can be dozens, if not hundreds, of nodes, each using a unique master encryption key by default. If one or several of those keys are known to be compromised, the operations team must rotate master encryption keys exclusively on the affected nodes. Before Percona Server for MongoDB provided the KMIP key state polling, no reliable method existed for identifying such nodes. Thus, the operations team had to rotate master keys on the whole cluster, which can result in stopping and starting many unaffected nodes.
These unnecessary master key rotations are more than the wasted effort of an operations team. In a replica set, a restarted node must fetch and apply the write operations made on the primary while the node is down. Additionally, restarting a primary node triggers reelection in a replica set. The accumulated effects of multiple restarts can cause unnecessary stress on the cluster in terms of service traffic and CPU. Additionally, rotating master keys on unaffected nodes increases the total mean time to resolve the incident. Key state polling eliminates unnecessary master key rotations, which are long and potentially harmful to a cluster, by pinpointing only the affected nodes so they can be dealt with immediately.
Please also note that Percona Server for MongoDB supports key state polling only if a master encryption key is stored on a KMIP server. At the time of writing, no equivalent feature exists when Percona Server for MongoDB is configured to store keys on HashiCorp’s Vault server.
KMIP key state, activation, and polling
A master key for data-at-rest encryption is essentially a key for the AES-256 encryption algorithm and, thus, from the KMIP perspective, falls under the Symmetric Key type of a managed object. The KMIP standard defines the number of states a managed object can be in and rules for transitioning between those states. From the viewpoint of Percona Server for MongoDB, all the states can be categorized into three groups:
- 1st group consists of a single state: Pre-Active
- 2nd group also includes only one state: Active
- 3rd group comprises all the other states, namely:
- Deactivated
- Compromised
- Destroyed
- Destroyed Compromised
Percona Server for MongoDB creates a new randomly generated master encryption key. Then it uses KMIP’s Register operation to record it on a KMIP server. At this point, the key is in the Pre-Active state. The upcoming releases go further by implementing the security.kmip.activateKeys configuration file option (further referred to as activateKeys). If it is true, which is the default, Percona Server for MongoDB also uses the Activate operation to transition the key into the Active state. The above procedure happens in two cases: first, when mongod is started with an empty data directory and encryption enabled, and second, when master key rotation is run without specifying any key identifier. Should any Register or Activate operations fail, the mongod refuses to start, or the master key rotation fails with an error message.
If, on the contrary, mongod starts with an already encrypted data directory and activateKeys is enabled (which, again, is the default), it issues the KMIP’s GetAttributes operation against the KMIP server to verify that the master encryption key that was used for the data directory encryption is in the Active state. If so, it retrieves the key using the Get operation. Similarly to the case of an empty data directory, any failure in the above operations leads to mongod logging an error message and refusing to start. Switching off activateKeys results in omitting the GetAtrributes operation and skipping verifying the master encryption key state.
In addition to activateKeys, upcoming releases of Percona Server for MongoDB will also add the security.kmip.keyStatePollingSeconds configuration file option (further called just keyStatePollingSeconds for brevity), equals 900 by default. When activateKeys is true, any positive value n of keyStatePollingSeconds instructs Percona Server for MongoDB to verify that the master encryption key is in the Active state every n seconds. In KMIP terminology, the Percona Server for MongoDB issues GetAtributes operation with Attribute equal to State against the KMIP server and checks that the response is Active. If that is not the case, the mongod logs the Master encryption key is not in the active state on the key management facility error message and shuts down with the 1001 exit code.
Reacting to and recovering from a compromised master key
Let’s return to our example of a deployment with sharding and replication. Let’s assume it has 1000 nodes, and we’ve just learned that, for instance, seven master encryption keys were compromised. To avoid doing unnecessary master key rotation on 993 nodes, we must determine those seven nodes that use the compromised keys. That is relatively easy to do. A security engineer transitions the affected keys into the Compromised state (or any other state of the 3rd group) on the KMIP server. One can do that with the Revoke operation that specifies Revocation Reason as equal to Key Compromised. Now, we don’t even need to seek the problematic nodes actively. Providing that the default values of activateKeys and keyStatePollingSeconds are used, they will identify themselves in 15 minutes or less by shutting down with the Master encryption key is not in the active state on the key management facility error message and 1001 exit code.
After finding the nodes, we need to change the master encryption key on each. This is precisely what security.kmip.rotateMasterKey configuration file option exists for. Set it to true and run mongod; it will rotate the key and exit. Please note one should not touch the activateKeys and keyStatePollingSeconds options before starting the rotation. Percona Server for MongoDB automatically detects that rotation is in effect and does the reasonable thing. Namely, it will decrypt the key database with the existing master key, even if it is not active, and re-encrypt the database back with the new master key after registering and activating the latter on the KMIP server.
After rotation finishes, simply remove the security.kmip.rotateMasterKey option from the configuration file and rerun mongod; this time, it starts in the normal mode with an active master encryption key and is ready to serve the client queries.
In our example, with 1000 nodes and seven compromised keys, we didn’t even touch the remaining 993 nodes. That is much less of an operational burden compared to when the key state polling feature is absent or disabled. Moreover, since we didn’t have to spend time on redundant key rotations on the unaffected nodes, we addressed compromised keys as soon as possible, thus minimizing the time they were in effect. Given the distinctive error message and the exit code, one can even go further and easily automate the master key rotation in the case of the inactive key.
Releases and upgrades
The key state polling feature will be released in future version 8.0.0 and patch releases of currently supported major versions, namely 5.0.28, 6.0.17, and 7.0.13. Those patch releases behave slightly differently for key state polling than version 8.0.0. But before we discuss the difference, let’s first consider how one can upgrade from an older version without key state polling (e.g. 7.0.12) to version 8.0.0 or higher.
Upgrading to version 8.0.0 from an older version
Before the feature was introduced, Percona Server for MongoDB didn’t touch the state of a master encryption key on a KMIP server. That means that any keys registered on a KMIP server were left in the Pre-Active state, and everything worked fine because older versions of Percona Server for MongoDB didn’t pay attention to the state of a key. Since the default value of activateKeys is true, the Percona Server for MongoDB versions 8.0.0 and higher tests whether a master encryption key is active before decrypting a data directory, and that test will fail. As a result, the typical 3-step upgrade procedure:
- Stop mongod
- Install the new version
- Start mongod
will fail in the third step producing the Master encryption key is not in the active state on the key management facility error message. Though it can be tempting to set the activateKeys option to false and proceed further with the upgrade, we strongly advise against such a decision since it effectively turns off a security feature. Instead, one needs to do a master key rotation before starting mongod so that the whole procedure looks as follows:
- Stop mongod
- Install the new version
- Rotate the master key
- Set the security.kmip.rotateMasterKey option to true in the configuration file; alternatively, set the –kmipRotateMasterKey command line option
- Start mongod and wait till it exits after doing the key rotation
- Remove security.kmip.rotateMasterKey from the configuration file or –kmipRotateMasterKey from the command line
- Start mongod as usual
The rotation must be done only once; future upgrades won’t need that.
Upgrading an older major version to a new patch release
In versions 5.0.28, 6.0.17, and 7.0.13, the behavior of key state polling is the same as in 8.0.0 except for one particular case. If activateKeys is not explicitly set, Percona Server for MongoDB continues working as usual when it detects at startup that an existing master encryption key is in the Pre-Active state. In that case, the following warning is logged: Data-at-rest encryption was initialized with a pre-active master key. Since version 8.0.0, an active key will be required. The other states from the 3rd group (e.g. Compromised, Deactivated, etc.) still lead to a fatal error. As a result, upgrading to a newer patch version (e.g. from 5.0.27 to 5.0.28) doesn’t require master key rotation immediately after installing a new version (5.0.28 in our example). The logged warning will notify the operations team that they can do a master key rotation or manually transition the keys to the Active state later at their convenience. After doing so, they can explicitly set the activateKeys configuration parameter to ensure that only active keys are used. It’s also worth mentioning that if an operations team transitions the master encryption keys to the Active state shortly after upgrading to a new patch release, the upgrade to version 8.0.0 will go smoother because it won’t require master key rotation.
Conclusion
The upcoming versions of Percona Server for MongoDB reduce mean time to resolve (MTTR) compromised encryption key incidents if the keys are stored on a KMIP server. I recommend upgrading as soon as versions 5.0.28, 6.0.17, 7.0.13, and 8.0.0 are available. If you need assistance, don’t hesitate to contact Percona Experts.
Whether you’re a seasoned DBA well-versed in MongoDB or a newcomer looking to harness its potential, this ebook provides the insights, strategies, and best practices to guide you through MongoDB upgrades, ensuring they go as smoothly as possible and your databases remain optimized and secure.
Download From Planning to Performance: MongoDB Upgrade Best Practices