使用 Vault 提供可搜索的密文

Vault高级数据保护功能
图 1.10.2/1 - Vault高级数据保护功能

近年来世界各国都开始密切关注信息安全,特别是数据安全、隐私安全问题,2021 年 06 月 10 日国家更是正式颁布了《中华人民共和国数据安全法》,从法律层面上规定了开展数据处理活动的主体必须对数据进行有效的安全保障,切实保护敏感隐私数据。

数据的保护有多种方法,例如现在云端和绝大多数商业数据库都提供的标配功能,数据的静态存储加密(Encryption at Rest):

静态存储加密
图 1.10.2/2 - 静态存储加密

数据在内存中以明文方式保存,但在落盘前会经过加密屏障处理,以密文形式写入磁盘等外部存储。这种加密机制可以有效对抗未经授权的第三方直接读取磁盘数据(例如盗取磁盘,或是未经授权对磁盘建立快照),但它存在其他的软肋,例如数据库账号被黑客盗取,黑客直接访问数据库;又或者企业通过ELT将数据同步到多个大数据分析平台,我们很难保证所有拥有数据的平台都能以最高规格保护数据不会外泄。

在静态存储加密的基础上我们还可以对敏感数据进行传输中加密(Encryption in Transit):

传输中加密
图 1.10.2/3 - 传输中加密

这里的传输中加密不止是利用 TLS、SSL 等对链路进行加密,我们还可以在应用程序接收到敏感的明文数据后,在应用内加密后再存入数据库,这样即使在数据库的内存中,数据仍然是加密的,而密钥是存储在数据库之外的安全存储内,这样即使黑客获取了数据库连接,或是拖库,得到的也是密文数据,只要黑客没有同时获得密钥,那么机密信息就仍然是安全的。

我们在前一节介绍了 Vault 的加密即服务,使用 Vault 来加解密数据,密钥由 Vault 保管,应用程序服务即使被入侵黑客也得不到密钥。而且借由 Vault 的权限控制能力,我们可以将加密与解密权限分开管理,某些面向 C 端用户的应用只能单向将数据加密,但无权调用解密;借助 Vault Transit 引擎的密钥版本控制以及密文 Re-Wrap 功能,我们可以实现定期强制执行的密钥轮替,进一步加固机密数据的安全保障。

对密文进行搜索

前不久被人问到一个这样的问题,启用 Vault 的 Transit 加密服务,把应用中的机密数据,例如用户姓名、手机号、信用卡号等信息加密存储到数据库之后,如果业务上要求要对这些列进行匹配搜索怎么办?因为数据库中的数据已经是密文了,所以直接用明文去匹配是不行的。

一个符合直觉的做法是,把提交的搜索条件也加密,用密文去匹配密文。可惜直接这样做会有一个问题:Vault Transit 默认的加密算法不是收敛(Convergent)的,也就是说,同一个密钥,同一段明文,在加密时 Vault 也会混入一个随机的 Nonce 值,导致生成的密文是不同的。我们来做个实验,启动一个 Vault 测试服务:

$ vault server -dev
==> Vault server configuration:

             Api Address: http://127.0.0.1:8200
                     Cgo: disabled
         Cluster Address: https://127.0.0.1:8201
              Go Version: go1.17.5
              Listener 1: tcp (addr: "127.0.0.1:8200", cluster address: "127.0.0.1:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
               Log Level: info
                   Mlock: supported: false, enabled: false
           Recovery Mode: false
                 Storage: inmem
                 Version: Vault v1.9.2
             Version Sha: f4c6d873e2767c0d6853b5d9ffc77b0d297bfbdf+CHANGES

==> Vault server started! Log data will stream in below:
...
WARNING! dev mode is enabled! In this mode, Vault runs entirely in-memory
and starts unsealed with a single unseal key. The root token is already
authenticated to the CLI, so you can immediately begin using Vault.

You may need to set the following environment variable:

    $ export VAULT_ADDR='http://127.0.0.1:8200'

The unseal key and root token are displayed below in case you want to
seal/unseal the Vault or re-authenticate.

Unseal Key: 03Gmq22/VnZPwpZDp8Voo82jQ+5e9qvChUVfUq0mTv8=
Root Token: s.BeNIwE8yEJFVKmyay1CcbFPV

Development mode should NOT be used in production installations!

然后用 Root 令牌登录,启用 Transit 引擎:

$ export VAULT_ADDR='http://127.0.0.1:8200' && vault login s.BeNIwE8yEJFVKmyay1CcbFPV
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.

Key                  Value
---                  -----
token                s.BeNIwE8yEJFVKmyay1CcbFPV
token_accessor       MhOj7Rb8oMnVwgjK64bQyG5Z
token_duration       ∞
token_renewable      false
token_policies       ["root"]
identity_policies    []
policies             ["root"]

$ vault secrets enable transit
Success! Enabled the transit secrets engine at: transit/

生成一个新的密钥:

$ vault write -f transit/keys/testkey
Success! Data written to: transit/keys/testkey

然后我们尝试对同一段明文加密两次:

$ vault write transit/encrypt/testkey plaintext=$(base64 <<< "hello world")
Key            Value
---            -----
ciphertext     vault:v1:1oUIafQ65lQZqRGyd7bITFeCSApjHg4s4UlHhLp7YCH3vc1+NKlSeg==
key_version    1

$ vault write transit/encrypt/testkey plaintext=$(base64 <<< "hello world")
Key            Value
---            -----
ciphertext     vault:v1:dBbyCBacAqG221zopKriwej+ge6+dk8JHbC+fD7wvawS9mdOT8paeQ==
key_version    1

可以看到,对同一段明文,生成的密文是不同的。这就使得使用密文匹配数据库中密文的想法无法成功。

Vault 的收敛加密

Vault 后来根据用户提出的需求,提供了收敛加密,基于收敛加密,我们可以对相同的明文生成相同的密文。我们利用 path-help 查看一下 Transit 引擎密钥的功能:

$ vault path-help transit/keys/key
Request:        keys/key
Matching Route: ^keys/(?P<name>\w(([\w-.]+)?\w)?)$

Managed named encryption keys

## PARAMETERS

    allow_plaintext_backup (bool)

        Enables taking a backup of the named
        key in plaintext format. Once set,
        this cannot be disabled.

    context (string)

        Base64 encoded context for key derivation.
        When reading a key with key derivation enabled,
        if the key type supports public keys, this will
        return the public key for the given context.

    convergent_encryption (bool)

        Whether to support convergent encryption.
        This is only supported when using a key with
        key derivation enabled and will require all
        requests to carry both a context and 96-bit
        (12-byte) nonce. The given nonce will be used
        in place of a randomly generated nonce. As a
        result, when the same context and nonce are
        supplied, the same ciphertext is generated. It
        is *very important* when using this mode that
        you ensure that all nonces are unique for a
        given context. Failing to do so will severely
        impact the ciphertext's security.

    derived (bool)

        Enables key derivation mode. This
        allows for per-transaction unique
        keys for encryption operations.

    exportable (bool)

        Enables keys to be exportable.
        This allows for all the valid keys
        in the key ring to be exported.

    name (string)

        Name of the key

    type (string)

        The type of key to create. Currently, "aes128-gcm96" (symmetric), "aes256-gcm96" (symmetric), "ecdsa-p256"
        (asymmetric), "ecdsa-p384" (asymmetric), "ecdsa-p521" (asymmetric), "ed25519" (asymmetric), "rsa-2048" (asymmetric), "rsa-3072"
        (asymmetric), "rsa-4096" (asymmetric) are supported.  Defaults to "aes256-gcm96".

## DESCRIPTION

This path is used to manage the named keys that are available.
Doing a write with no value against a new named key will create
it using a randomly generated key.

可以看到,创建密钥时我们可以指定 convergent_encryptionderived 参数,这两个参数可以让我们启用收敛加密。

$ vault path-help transit/encrypt/convergent_key
Request:        encrypt/convergent_key
Matching Route: ^encrypt/(?P<name>\w(([\w-.]+)?\w)?)$

Encrypt a plaintext value or a batch of plaintext
blocks using a named key

## PARAMETERS

    context (string)

        Base64 encoded context for key derivation. Required if key derivation is enabled

    convergent_encryption (bool)

        This parameter will only be used when a key is expected to be created.  Whether
        to support convergent encryption. This is only supported when using a key with
        key derivation enabled and will require all requests to carry both a context
        and 96-bit (12-byte) nonce. The given nonce will be used in place of a randomly
        generated nonce. As a result, when the same context and nonce are supplied, the
        same ciphertext is generated. It is *very important* when using this mode that
        you ensure that all nonces are unique for a given context.  Failing to do so
        will severely impact the ciphertext's security.

    key_version (int)

        The version of the key to use for encryption.
        Must be 0 (for latest) or a value greater than or equal
        to the min_encryption_version configured on the key.

    name (string)

        Name of the policy

    nonce (string)

        Base64 encoded nonce value. Must be provided if convergent encryption is
        enabled for this key and the key was generated with Vault 0.6.1. Not required
        for keys created in 0.6.2+. The value must be exactly 96 bits (12 bytes) long
        and the user must ensure that for any given context (and thus, any given
        encryption key) this nonce value is **never reused**.

    plaintext (string)

        Base64 encoded plaintext value to be encrypted

    type (string)

        This parameter is required when encryption key is expected to be created.
        When performing an upsert operation, the type of key to create. Currently,
        "aes128-gcm96" (symmetric) and "aes256-gcm96" (symmetric) are the only types supported. Defaults to "aes256-gcm96".

## DESCRIPTION

This path uses the named key from the request path to encrypt a user provided
plaintext or a batch of plaintext blocks. The plaintext must be base64 encoded.

可以看到,有一个名为 context 的参数,只要 context 的值相同,同样的明文就会生成同样的密文。

让我们首先创建一个用于收敛加密的密钥,记得要开启 convergent_encryptionderived(不要添加-前缀):

$ vault write -f transit/keys/convergent_key convergent_encryption=true derived=true
Success! Data written to: transit/keys/convergent_key

然后使用该密钥,搭配同一个 context 把同一段明文加密两次看看:

$ vault write transit/encrypt/convergent_key plaintext=$(base64 <<< "hello world") context=$(base64 <<< "mycontext")
Key            Value
---            -----
ciphertext     vault:v1:y3eR4qJugpNg9Aqt4/JMfthcen2IYHKyDlKBRKeahbVNJH5puKnAMw==
key_version    1

$ vault write transit/encrypt/convergent_key plaintext=$(base64 <<< "hello world") context=$(base64 <<< "mycontext")
Key            Value
---            -----
ciphertext     vault:v1:y3eR4qJugpNg9Aqt4/JMfthcen2IYHKyDlKBRKeahbVNJH5puKnAMw==
key_version    1

可以看到,同样的 plaintextcontext 产生的密文是一样的。我们试试不同的 context 值:

$ vault write transit/encrypt/convergent_key plaintext=$(base64 <<< "hello world") context=$(base64 <<< "newcontext")
Key            Value
---            -----
ciphertext     vault:v1:B6TwZ0MxXeAuNBkR7cNvcXs9T0jNB4Q+t4YTVFTR+3c8vXOfWS2yHA==
key_version    1

可以看到,context 变化后密文也会发生变化。

那么我们是不是可以把机密数据用收敛加密算法加密后存进数据库,搜索时把搜索条件的值也用同样的 context 值加密去匹配呢?很遗憾,不行,这会遇到问题。

Re-Wrap 的问题

为了对抗已知明文的暴力破解攻击,每个密钥在使用数十亿次之后都应该进行更换。在生成了新版密钥后,我们应该逐步将所有旧密钥加密的密文,用新密钥重新加密存储。让我们对刚才创建的收敛密钥进行一次轮替操作:

$ vault write -f transit/keys/convergent_key/rotate
Success! Data written to: transit/keys/convergent_key/rotate

$ vault read transit/keys/convergent_key
Key                              Value
---                              -----
allow_plaintext_backup           false
convergent_encryption            true
convergent_encryption_version    -1
deletion_allowed                 false
derived                          true
exportable                       false
kdf                              hkdf_sha256
keys                             map[1:1641040308 2:1641040506]
latest_version                   2
min_available_version            0
min_decryption_version           1
min_encryption_version           0
name                             convergent_key
supports_decryption              true
supports_derivation              true
supports_encryption              true
supports_signing                 false
type                             aes256-gcm96

可以看到目前存在两个版本的密钥了。让我们重新用相同的 plaintextcontext 值执行一次加密:

$ vault write transit/encrypt/convergent_key plaintext=$(base64 <<< "hello world") context=$(base64 <<< "mycontext")
Key            Value
---            -----
ciphertext     vault:v2:0QFHL2vFe7B2Ff1dNmy4fzYb/kVR8vYYbE0mDhNIWABcoIqkv2EK7Q==
key_version    2

第一个版本的密钥加密后的密文是 vault:v1:y3eR4qJugpNg9Aqt4/JMfthcen2IYHKyDlKBRKeahbVNJH5puKnAMw==,与当前版本密钥的密文不同。想象一下我们执行了密钥轮替,然后使用一个后台进程逐行读取数据库中的存量数据,将旧版密文替换成新版密文,那么在这个过程中必然会有一段时间同时存在新旧两个版本的密文,这时用户提交的搜索请求,究竟是用哪个版本的密钥加密搜索值就是一个大问题。

盲索引

我们除了收敛加密以外,也可以用其他方法来解决针对密文的匹配搜索问题,比如盲索引。盲索引简单来讲就是为明文生成一段哈希码,与密文一起存储;当我们需要对密文进行匹配搜索时,我们生成明文的哈希码,然后用哈希码去数据库匹配搜索,这样就可以在没有密钥的情况下,直接针对密文进行搜索了。

但是我们不能直接使用诸如 MD5、SHA-1 这样的哈希算法生成盲索引,因为一个成功拖库的攻击者可以针对密文进行猜测来匹配,例如手机号、姓名等,黑客可以轻松生成他感兴趣的数据的哈希码,然后快速在盲索引中匹配到记录是否存在。

所以盲索引的生成算法应该使用散列消息认证码(HMAC)算法。HMAC 是一种结合了哈希算法以及加密密钥的算法,生成明文的哈希码时必须同时提供一个密钥,不同的密钥会导致同一段明文生成的哈希码不同。

Transit引擎同样也提供了生成HMAC哈希码的功能:

$ vault write transit/hmac/testkey/sha2-256 input=$(base64 <<< "hello world")
Key     Value
---     -----
hmac    vault:v1:1UZWvdSTPrlzOW282FJhg0wyHGrq/B9MveU9f/682tY=

针对明文,我们用 Vault 中保存的 testkey 密钥生成了HMAC哈希码,全程 testkey 密钥值都没有流出 Vault。我们换一个密钥试试:

$ vault write transit/hmac/convergent_key/sha2-256 input=$(base64 <<< "hello world")
Key     Value
---     -----
hmac    vault:v2:R7zCOuGq2PbeXLn627e9UmtbVUgZmoEQ+wCnK+Ob+HA=

不同的密钥生成的哈希码不同,黑客拖库后如果没有得到 Vault 中保存的密钥,他也无法猜测明文来生成哈希码,从而无法判定指定明文是否存在于数据库内。

用这种方法保存的密文,想要进行搜索,或是获得明文,都必须在有权操作Vault的环境下,这就直接杜绝了拖库后对利用数据的可能性。

results matching ""

    No results matching ""