-
Notifications
You must be signed in to change notification settings - Fork 66
Description
In order to derive ZFS encryption keys from secret Input Key Material (IKM), we need a mechanism to retrieve the correct IKM. The key-manager provides functionality to take IKM of a fixed size, along with disk information, and derive keys, which are then used by sled-agent to encrypt and decrypt ZFS datasets.
The key-manager relies on a SecretRetriever which abstracts how IKM is constructed, and returns the appropriate IKM when asked.
We currently have implemented 2 different secret retrievers: one for LRTQ, and one for a hardcoded secret (used for single node dev systems) that we then wrap in a shared retriever that we initiate differently depending upon context. This shared retriever is a bit cumbersome, when we have a trait for our retrievers, and we may want to get rid of it in favor of a trait object stored in a OnceLock like MAYBE_LRTQ_RETRIEVER.
We'll need to add a 3rd secret retriever for real trust quorum. This should look similar to the LrtqSecretRetriever, but is actually capable of retrieving keys across different epochs via the NodeTaskHandle::load_rack_secret API.
While mostly straightforward, there are a couple of wrinkles here:
- The new rack secret retriever has to be dynamic. If all racks were newly initialized with trust quorum, then we could go ahead use the new secret retriever alone. However, for deployed systems we need to be able to use the LRTQ secret retriever until the point that the LRTQ upgrade commits at a given sled-agent. At that point the sled-agent will have to switch to using the new secret retriever.
- The API for the
SecretRetrievertrait was pretty speculative when it was created over 2 years ago. We may decide that we don't even want a trait and we just choose to implement one concrete, dynamic secret retriever. The argument against this is that in production we don't want the ability to use theHardcodedSecretRetriever. And once all field systems have been updated to run trust quorum, we'll just want a trust quorum retriever.
There's definitely some design decisions to be made here and will require some collaboration.