FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor


 
 
LinkBack Thread Tools
 
Old 11-10-2011, 01:42 PM
Will Drewry
 
Default dm: verity target

On Thu, Nov 10, 2011 at 1:44 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Wed, Nov 09, 2011 at 09:18:10PM -0800, Mandeep Singh Baines wrote:
>>
>> + * TODO(wad): All hash storage memory is pre-allocated and freed once an
>> + * entire branch has been verified.
>> + */
>> +struct dm_bht {
>> + * * /* Configured values */
>> + * * int depth; */* Depth of the tree including the root */
>> + * * unsigned int block_count; */* Number of blocks hashed */
>> + * * unsigned int block_size; */* Size of a hash block */
>> + * * char hash_alg[CRYPTO_MAX_ALG_NAME];
>> + * * unsigned char salt[DM_BHT_SALT_SIZE];
>> +
>> + * * /* Computed values */
>> + * * unsigned int node_count; */* Data size (in hashes) for each entry */
>> + * * unsigned int node_count_shift; */* first bit set - 1 */
>> + * * /* There is one per CPU so that verified can be simultaneous. */
>> + * * struct hash_desc hash_desc[NR_CPUS]; */* Container for the hash alg */
>
> Please don't add a new user for the old hash interface. If the hashes can
> be done asynchronous you can use ahash, if not use shash. Both interfaces
> are reentrant, that's probaply what you want to have here. You don't
> need to have this in a per cpu manner.

I'll check out the two interfaces. I didn't realize hash_desc was
deprecated specifically in favor of one of the others. I'm interested
in seeing if it is possible to not keep a per-cpu desc though with the
other apis. We do this now to avoid contention across multiple kernel
threads on crypto operations sharing the hash_desc (e.g., by wrapping
it in a mutex).

Thanks!

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 11-11-2011, 04:34 PM
Mandeep Singh Baines
 
Default dm: verity target

Changes in v2:
* Use shash API instead of older hash API (thanks Steffen).

-- >8 -- (snip)

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path. This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: dm-devel@redhat.com
Cc: linux-kernel@vger.kernel.org
---
Documentation/device-mapper/dm-bht.txt | 59 ++
Documentation/device-mapper/dm-verity.txt | 76 +++
drivers/md/Kconfig | 30 +
drivers/md/Makefile | 2 +
drivers/md/dm-bht.c | 559 +++++++++++++++
drivers/md/dm-verity.c | 1043 +++++++++++++++++++++++++++++
drivers/md/dm-verity.h | 45 ++
include/linux/dm-bht.h | 166 +++++
8 files changed, 1980 insertions(+), 0 deletions(-)
create mode 100644 Documentation/device-mapper/dm-bht.txt
create mode 100644 Documentation/device-mapper/dm-verity.txt
create mode 100644 drivers/md/dm-bht.c
create mode 100644 drivers/md/dm-verity.c
create mode 100644 drivers/md/dm-verity.h
create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation. The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure. Each node in the tree is a cryptographic hash. If it is a leaf
+node, the hash is of some block data on disk. If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node). Each level in
+the tree is concretely made up of dm_bht_entry structs. Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block. The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm. The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+ [ root ]
+ / . . .
+ [entry_0] [entry_1]
+ / . . . . . .
+ [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127]
+ / ... / . . . /
+ blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht. hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks. The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify. This can be done by calling dm_bht_populate(). When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash. Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg>
+ salt=<salt> root_hexagiest=<root hash>
+ [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+ This is the device that is going to be integrity checked. It may be
+ a subset of the full device as specified to dmsetup (start sector and count)
+ It may be specified as a path, like /dev/sdaX, or a device number,
+ <major>:<minor>.
+
+<hash device path>
+ This is the device that that supplies the dm-bht hash data. It may be
+ specified similarly to the device path and may be the same device. If the
+ same device is used, the hash offset should be outside of the dm-verity
+ configured device size.
+
+<alg>
+ The cryptographic hash algorithm used for this device. This should
+ be the name of the algorithm, like "sha1".
+
+<salt>
+ Salt value (in hex).
+
+<root hash>
+ The hexadecimal encoding of the cryptographic hash of all of the
+ neighboring nodes at the first level of the tree. This hash should be
+ trusted as there is no other authenticity beyond this point.
+
+<hash start>
+ Start address of hashes (default 0).
+
+<error behavior>
+ 0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path. This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access. If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail. This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis. This allows for a lightweight hash computation on first read
+into the page cache. Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+ dmsetup create vroot --table
+ "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "
+ "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de97 25a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+ http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
---help---
A target that intermittently fails I/O for debugging purposes.

+config DM_BHT
+ tristate "Block hash tree support"
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ Include support for device-mapper devices to use a block hash
+ tree for managing data integrity checks in a scalable way.
+
+ Targets that use this functionality should include it
+ automatically.
+
+ If unsure, say N.
+
+config DM_VERITY
+ tristate "Verity target support"
+ depends on BLK_DEV_DM
+ select DM_BHT
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ This device-mapper target allows you to create a device that
+ transparently integrity checks the data on it. You'll need to
+ activate the digests you're going to use in the cryptoapi
+ configuration.
+
+ To compile this code as a module, choose M here: the module will
+ be called dm-verity.
+
+ If unsure, say N.
+
endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
+obj-$(CONFIG_DM_BHT) += dm-bht.o
+obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..6eb2be3
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,559 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+ if ((ch >= '0') && (ch <= '9'))
+ return ch - '0';
+ if ((ch >= 'a') && (ch <= 'f'))
+ return ch - 'a' + 10;
+ if ((ch >= 'A') && (ch <= 'F'))
+ return ch - 'A' + 10;
+ return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ sprintf((char *)hex, "%02hhx", (int)*binary);
+ hex += 2;
+ binary++;
+ }
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+ unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ *binary = from_hex(*(hex++));
+ *binary *= 16;
+ *binary += from_hex(*(hex++));
+ binary++;
+ }
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+ u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+ u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+ dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+ dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+ DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+ unsigned int offset, u8 *digest)
+{
+ struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
+ void *data;
+ int err;
+
+ /* Note, this is synchronous. */
+ if (crypto_shash_init(hash_desc)) {
+ DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+ smp_processor_id());
+ return -EINVAL;
+ }
+ data = kmap_atomic(pg);
+ err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+ kunmap_atomic(data);
+ if (err) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_final(hash_desc, digest)) {
+ DMCRIT("crypto_hash_final failed");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+ /* last represents the index of the last digest store in the tree.
+ * By walking the tree with that index, it is possible to compute the
+ * total number of entries at each level.
+ *
+ * Since each entry will contain up to |node_count| nodes of the tree,
+ * it is possible that the last index may not be at the end of a given
+ * entry->nodes. In that case, it is assumed the value is padded.
+ *
+ * Note, we treat both the tree root (1 hash) and the tree leaves
+ * independently from the bht data structures. Logically, the root is
+ * depth=-1 and the block layer level is depth=bht->depth
+ */
+ unsigned int last = bht->block_count;
+ int depth;
+
+ /* check that the largest level->count can't result in an int overflow
+ * on allocation or sector calculation.
+ */
+ if (((last >> bht->node_count_shift) + 1) >
+ UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+ (unsigned int)to_sector(bht->block_size))) {
+ DMCRIT("required entries %u is too large", last + 1);
+ return -EINVAL;
+ }
+
+ /* Track the current sector location for each level so we don't have to
+ * compute it during traversals.
+ */
+ bht->sectors = 0;
+ for (depth = 0; depth < bht->depth; ++depth) {
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+ level->entries = (struct dm_bht_entry *)
+ kcalloc(level->count,
+ sizeof(struct dm_bht_entry),
+ GFP_KERNEL);
+ if (!level->entries) {
+ DMERR("failed to allocate entries for depth %d", depth);
+ return -ENOMEM;
+ }
+ level->sector = bht->sectors;
+ bht->sectors += level->count * to_sector(bht->block_size);
+ }
+
+ return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht: pointer to a dm_bht_create()d bht
+ * @depth: tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name: crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+ unsigned int block_size, const char *alg_name)
+{
+ struct crypto_shash *tfm;
+ int size, cpu, status = 0;
+
+ bht->block_size = block_size;
+ /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+ if ((block_size > PAGE_SIZE) ||
+ (PAGE_SIZE % block_size) ||
+ (to_sector(block_size) == 0))
+ return -EINVAL;
+
+ tfm = crypto_alloc_shash(alg_name, 0, 0);
+ if (IS_ERR(tfm)) {
+ DMERR("failed to allocate crypto hash '%s'", alg_name);
+ return -ENOMEM;
+ }
+ size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+ /* Pre-allocate per-cpu crypto contexts to avoid having to
+ * kmalloc/kfree a context for every hash operation.
+ */
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+ struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+ bht->hash_desc[cpu] = hash_desc;
+ if (!hash_desc) {
+ DMERR("failed to allocate crypto hash contexts");
+ status = -ENOMEM;
+ goto bad_hash_alloc;
+ }
+ hash_desc->tfm = tfm;
+ hash_desc->flags = 0x0;
+ }
+ bht->digest_size = crypto_shash_digestsize(tfm);
+ /* We expect to be able to pack >=2 hashes into a block */
+ if (block_size / bht->digest_size < 2) {
+ DMERR("too few hashes fit in a block");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+ DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Configure the tree */
+ bht->block_count = block_count;
+ if (block_count == 0) {
+ DMERR("block_count must be non-zero");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Each dm_bht_entry->nodes is one block. The node code tracks
+ * how many nodes fit into one entry where a node is a single
+ * hash (message digest).
+ */
+ bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+ /* Round down to the nearest power of two. This makes indexing
+ * into the tree much less painful.
+ */
+ bht->node_count = 1 << bht->node_count_shift;
+
+ /* This is unlikely to happen, but with 64k pages, who knows. */
+ if (bht->node_count > UINT_MAX / bht->digest_size) {
+ DMERR("node_count * hash_len exceeds UINT_MAX!");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+ /* Ensure that we can safely shift by this value. */
+ if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+ DMERR("specified depth and node_count_shift is too large");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Allocate levels. Each level of the tree may have an arbitrary number
+ * of dm_bht_entry structs. Each entry contains node_count nodes.
+ * Each node in the tree is a cryptographic digest of either node_count
+ * nodes on the subsequent level or of a specific block on disk.
+ */
+ bht->levels = (struct dm_bht_level *)
+ kcalloc(bht->depth,
+ sizeof(struct dm_bht_level), GFP_KERNEL);
+ if (!bht->levels) {
+ DMERR("failed to allocate tree levels");
+ status = -ENOMEM;
+ goto bad_level_alloc;
+ }
+
+ bht->read_cb = NULL;
+
+ status = dm_bht_initialize_entries(bht);
+ if (status)
+ goto bad_entries_alloc;
+
+ /* We compute depth such that there is only be 1 block at level 0. */
+ BUG_ON(bht->levels[0].count != 1);
+
+ return 0;
+
+bad_entries_alloc:
+ while (bht->depth-- > 0)
+ kfree(bht->levels[bht->depth].entries);
+ kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+ for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+ crypto_free_shash(tfm);
+ return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry: pointer to the entry that's been loaded
+ * @status: I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+ if (status) {
+ /* TODO(wad) add retry support */
+ DMCRIT("an I/O error occurred while reading entry");
+ atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+ /* entry->nodes will be freed later */
+ return;
+ }
+ BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+ atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ * @pg: page holding the block data
+ * @offset: offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset)
+{
+ int state, depth = bht->depth;
+ u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_entry *entry;
+ void *node;
+
+ do {
+ /* Need to check that the hash of the current block is accurate
+ * in its parent.
+ */
+ entry = dm_bht_get_entry(bht, depth - 1, block);
+ state = atomic_read(&entry->state);
+ /* This call is only safe if all nodes along the path
+ * are already populated (i.e. READY) via dm_bht_populate.
+ */
+ BUG_ON(state < DM_BHT_ENTRY_READY);
+ node = dm_bht_get_node(bht, entry, depth, block);
+
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, node, bht->digest_size))
+ goto mismatch;
+
+ /* Keep the containing block of hashes to be verified in the
+ * next pass.
+ */
+ pg = virt_to_page(entry->nodes);
+ offset = offset_in_page(entry->nodes);
+ } while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+ if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, bht->root_digest, bht->digest_size))
+ goto mismatch;
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ /* Mark path to leaf as verified. */
+ for (depth++; depth < bht->depth; depth++) {
+ entry = dm_bht_get_entry(bht, depth, block);
+ /* At this point, entry can only be in VERIFIED or READY state.
+ * So it is safe to use atomic_set instead of atomic_cmpxchg.
+ */
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ return 0;
+
+mismatch:
+ DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+ depth, block);
+ dm_bht_log_mismatch(bht, node, digest);
+ return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ * block are all ready
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+ int depth;
+
+ for (depth = bht->depth - 1; depth >= 0; depth--) {
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+ return false;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht: pointer to a dm_bht_create()d bht
+ * @ctx: context used for all read_cb calls on this request
+ * @block: specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+ int depth, state;
+
+ BUG_ON(block >= bht->block_count);
+
+ for (depth = bht->depth - 1; depth >= 0; --depth) {
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ state = atomic_cmpxchg(&entry->state,
+ DM_BHT_ENTRY_UNALLOCATED,
+ DM_BHT_ENTRY_PENDING);
+ if (state == DM_BHT_ENTRY_VERIFIED)
+ break;
+ if (state <= DM_BHT_ENTRY_ERROR)
+ goto error_state;
+ if (state != DM_BHT_ENTRY_UNALLOCATED)
+ continue;
+
+ /* Current entry is claimed for allocation and loading */
+ entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+ if (!entry->nodes)
+ goto nomem;
+
+ bht->read_cb(ctx,
+ level->sector + to_sector(index * bht->block_size),
+ entry->nodes, to_sector(bht->block_size), entry);
+ }
+
+ return 0;
+
+error_state:
+ DMCRIT("block %u at depth %d is in an error state", block, depth);
+ return -EPERM;
+
+nomem:
+ DMCRIT("failed to allocate memory for entry->nodes");
+ return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht: pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+ int depth, cpu;
+
+ for (depth = 0; depth < bht->depth; depth++) {
+ struct dm_bht_entry *entry = bht->levels[depth].entries;
+ struct dm_bht_entry *entry_end = entry +
+ bht->levels[depth].count;
+ for (; entry < entry_end; ++entry)
+ kfree(entry->nodes);
+ kfree(bht->levels[depth].entries);
+ }
+ kfree(bht->levels);
+ crypto_free_shash((bht->hash_desc[0])->tfm);
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: array of u8s containing the new digest in binary
+ * Returns non-zero on error. hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+ /* Make sure we have at least the bytes expected */
+ if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+ bht->digest_size * 2) {
+ DMERR("root digest length does not match hash algorithm");
+ return -1;
+ }
+ dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: u8 array of size @available
+ * @available: must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+ if (available < 0 ||
+ ((unsigned int) available) < bht->digest_size * 2 + 1) {
+ DMERR("hexdigest has too few bytes available");
+ return -EINVAL;
+ }
+ dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: salt string, as hex; will be zero-padded or truncated to
+ * DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+ size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+ memset(bht->salt, 0, sizeof(bht->salt));
+ dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+ dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..a9bd0e8
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1043 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ * verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+ "notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+ "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+ VERITY_IOFLAGS_CLONED = 0x1, /* original bio has been cloned */
+};
+
+struct dm_verity_io {
+ struct dm_target *target;
+ struct bio *bio;
+ struct delayed_work work;
+ unsigned int flags;
+
+ int error;
+ atomic_t pending;
+
+ u64 block; /* aligned block index */
+ u64 count; /* aligned count in blocks */
+};
+
+struct verity_config {
+ struct dm_dev *dev;
+ sector_t start;
+ sector_t size;
+
+ struct dm_dev *hash_dev;
+ sector_t hash_start;
+
+ struct dm_bht bht;
+
+ /* Pool required for io contexts */
+ mempool_t *io_pool;
+ /* Pool and bios required for making sure that backing device reads are
+ * in PAGE_SIZE increments.
+ */
+ struct bio_set *bs;
+
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+ int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&verity_error_not ifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifi er);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&verity_error_n otifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_noti fier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+ struct dm_verity_io *io = bio->bi_private;
+ struct verity_config *vc = io->target->private;
+ bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+ int nr_iovecs)
+{
+ return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+ struct bio *bio)
+{
+ struct verity_config *vc = ti->private;
+ sector_t sector = bio->bi_sector - ti->begin;
+ struct dm_verity_io *io;
+
+ io = mempool_alloc(vc->io_pool, GFP_NOIO);
+ if (unlikely(!io))
+ return NULL;
+ io->flags = 0;
+ io->target = ti;
+ io->bio = bio;
+ io->error = 0;
+
+ /* Adjust the sector by the virtual starting sector */
+ io->block = to_bytes(sector) / vc->bht.block_size;
+ io->count = bio->bi_size / vc->bht.block_size;
+
+ atomic_set(&io->pending, 0);
+
+ return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ struct bio *bio = io->bio;
+ struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+ if (!clone)
+ return NULL;
+
+ __bio_clone(clone, bio);
+ clone->bi_private = io;
+ clone->bi_end_io = kverityd_src_io_read_end;
+ clone->bi_bdev = vc->dev->bdev;
+ clone->bi_sector += vc->start - io->target->begin;
+ clone->bi_destructor = dm_verity_bio_destructor;
+
+ return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+ int error)
+{
+ const char *message;
+ int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ dev_t devt = 0;
+ u64 block = ~0;
+ int transient = 1;
+ struct dm_verity_error_state error_state;
+
+ if (vc) {
+ devt = vc->dev->bdev->bd_dev;
+ error_mode = vc->error_behavior;
+ }
+
+ if (io) {
+ io->error = -EIO;
+ block = io->block;
+ }
+
+ switch (error) {
+ case -ENOMEM:
+ message = "out of memory";
+ break;
+ case -EBUSY:
+ message = "pending data seen during verify";
+ break;
+ case -EFAULT:
+ message = "crypto operation failure";
+ break;
+ case -EACCES:
+ message = "integrity failure";
+ /* Image is bad. */
+ transient = 0;
+ break;
+ case -EPERM:
+ message = "hash tree population failure";
+ /* Should be dm-bht specific errors */
+ transient = 0;
+ break;
+ case -EINVAL:
+ message = "unexpected missing/invalid data";
+ /* The device was configured incorrectly - fallback. */
+ transient = 0;
+ break;
+ default:
+ /* Other errors can be passed through as IO errors */
+ message = "unknown or I/O error";
+ return;
+ }
+
+ DMERR_LIMIT("verification failure occurred: %s", message);
+
+ if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+ error_state.code = error;
+ error_state.transient = transient;
+ error_state.block = block;
+ error_state.message = message;
+ error_state.dev_start = vc->start;
+ error_state.dev_len = vc->size;
+ error_state.dev = vc->dev->bdev;
+ error_state.hash_dev_start = vc->hash_start;
+ error_state.hash_dev_len = vc->bht.sectors;
+ error_state.hash_dev = vc->hash_dev->bdev;
+
+ /* Set default fallthrough behavior. */
+ error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+ if (!blocking_notifier_call_chain(
+ &verity_error_notifier, transient, &error_state)) {
+ error_mode = error_state.behavior;
+ }
+ }
+
+ switch (error_mode) {
+ case DM_VERITY_ERROR_BEHAVIOR_EIO:
+ break;
+ case DM_VERITY_ERROR_BEHAVIOR_NONE:
+ if (error != -EIO && io)
+ io->error = 0;
+ break;
+ default:
+ goto do_panic;
+ }
+ return;
+
+do_panic:
+ panic("dm-verity failure: "
+ "device:%u:%u error:%d block:%llu message:%s",
+ MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior: NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+ const char * const *allowed = allowed_error_behaviors;
+ char index = '0';
+
+ for (; *allowed; allowed++, index++)
+ if (!strcmp(*allowed, behavior) || behavior[0] == index)
+ break;
+
+ if (!*allowed)
+ return -1;
+
+ /* Convert to the integer index matching the enum. */
+ return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+
+ if (io->error)
+ verity_error(vc, io, io->error);
+
+ bio_endio(io->bio, io->error);
+ mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block)
+ if (!dm_bht_is_populated(&vc->bht, block))
+ return false;
+
+ return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+ if (!atomic_dec_and_test(&io->pending))
+ goto done;
+
+ if (unlikely(io->error))
+ goto io_error;
+
+ /* I/Os that were pending may now be ready */
+ if (verity_is_bht_populated(io)) {
+ INIT_DELAYED_WORK(&io->work, kverityd_verify);
+ queue_delayed_work(kveritydq, &io->work, 0);
+ } else {
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+ }
+
+done:
+ return;
+
+io_error:
+ verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device. The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+ struct dm_verity_io *io)
+{
+ unsigned int block_size = vc->bht.block_size;
+ struct bio *bio = io->bio;
+ u64 block = io->block;
+ unsigned int idx;
+ int r;
+
+ for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+ struct bio_vec *bv = bio_iovec_idx(bio, idx);
+ unsigned int offset = bv->bv_offset;
+ unsigned int len = bv->bv_len;
+
+ VERITY_BUG_ON(offset % block_size);
+ VERITY_BUG_ON(len % block_size);
+
+ while (len) {
+ r = dm_bht_verify_block(&vc->bht, block,
+ bv->bv_page, offset);
+ if (r)
+ goto bad_return;
+
+ offset += block_size;
+ len -= block_size;
+ block++;
+ cond_resched();
+ }
+ }
+
+ return 0;
+
+bad_return:
+ /* dm_bht functions aren't expected to return errno friendly
+ * values. They are converted here for uniformity.
+ */
+ if (r > 0) {
+ DMERR("Pending data for block %llu seen at verify", ULL(block));
+ r = -EBUSY;
+ } else {
+ DMERR_LIMIT("Block hash does not match!");
+ r = -EACCES;
+ }
+ return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+ struct verity_config *vc = io->target->private;
+
+ io->error = verity_verify(vc, io);
+
+ /* Free up the bio and tag with the return value */
+ verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O. The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+ struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+ struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+ /* Tell the tree to atomically update now that we've populated
+ * the given entry.
+ */
+ dm_bht_read_completed(entry, error);
+
+ /* Clean up for reuse when reading data to be checked */
+ bio->bi_vcnt = 0;
+ bio->bi_io_vec->bv_offset = 0;
+ bio->bi_io_vec->bv_len = 0;
+ bio->bi_io_vec->bv_page = NULL;
+ /* Restore the private data to I/O so the destructor can be shared. */
+ bio->bi_private = (void *) io;
+ bio_put(bio);
+
+ /* We bail but assume the tree has been marked bad. */
+ if (unlikely(error)) {
+ DMERR("Failed to read for sector %llu (%u)",
+ ULL(io->bio->bi_sector), io->bio->bi_size);
+ io->error = error;
+ /* Pass through the error to verity_dec_pending below */
+ }
+ /* When pending = 0, it will transition to reading real data */
+ verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+ sector_t count,
+ struct dm_bht_entry *entry)
+{
+ struct dm_verity_io *io = ctx; /* I/O for this batch */
+ struct verity_config *vc;
+ struct bio *bio;
+
+ vc = io->target->private;
+
+ /* The I/O context is nested inside the entry so that we don't need one
+ * io context per page read.
+ */
+ entry->io_context = ctx;
+
+ /* We should only get page size requests at present. */
+ verity_inc_pending(io);
+ bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+ if (unlikely(!bio)) {
+ DMCRIT("Out of memory at bio_alloc_bioset");
+ dm_bht_read_completed(entry, -ENOMEM);
+ return -ENOMEM;
+ }
+ bio->bi_private = (void *) entry;
+ bio->bi_idx = 0;
+ bio->bi_size = vc->bht.block_size;
+ bio->bi_sector = vc->hash_start + start;
+ bio->bi_bdev = vc->hash_dev->bdev;
+ bio->bi_end_io = kverityd_io_bht_populate_end;
+ bio->bi_rw = REQ_META;
+ /* Only need to free the bio since the page is managed by bht */
+ bio->bi_destructor = dm_verity_bio_destructor;
+ bio->bi_vcnt = 1;
+ bio->bi_io_vec->bv_offset = offset_in_page(dst);
+ bio->bi_io_vec->bv_len = to_bytes(count);
+ /* dst is guaranteed to be a page_pool allocation */
+ bio->bi_io_vec->bv_page = virt_to_page(dst);
+ /* Track that this I/O is in use. There should be no risk of the io
+ * being removed prior since this is called synchronously.
+ */
+ generic_make_request(bio);
+ return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block) {
+ int ret = dm_bht_populate(&vc->bht, io, block);
+
+ if (ret < 0) {
+ /* verity_dec_pending will handle the error case. */
+ io->error = ret;
+ break;
+ }
+ }
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+ struct dm_verity_io *io = clone->bi_private;
+
+ if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+ error = -EIO;
+
+ if (unlikely(error)) {
+ DMERR("Error occurred: %d (%llu, %u)",
+ error, ULL(clone->bi_sector), clone->bi_size);
+ io->error = error;
+ }
+
+ /* Release the clone which just avoids the block layer from
+ * leaving offsets, etc in unexpected states.
+ */
+ bio_put(clone);
+
+ verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+ struct bio *clone;
+
+ /* Check if the read is already issued. */
+ if (io->flags & VERITY_IOFLAGS_CLONED)
+ return;
+
+ io->flags |= VERITY_IOFLAGS_CLONED;
+
+ /* Clone the bio. The block layer may modify the bvec array. */
+ clone = verity_bio_clone(io);
+ if (unlikely(!clone)) {
+ io->error = -ENOMEM;
+ return;
+ }
+
+ verity_inc_pending(io);
+
+ generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+
+ /* Issue requests asynchronously. */
+ verity_inc_pending(io);
+ kverityd_src_io_read(io);
+ kverityd_io_bht_populate(io);
+ verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+ atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+ union map_info *map_context)
+{
+ struct dm_verity_io *io;
+ struct verity_config *vc;
+ struct request_queue *r_queue;
+
+ if (unlikely(!ti)) {
+ DMERR("dm_target was NULL");
+ return -EIO;
+ }
+
+ vc = ti->private;
+ r_queue = bdev_get_queue(vc->dev->bdev);
+
+ if (bio_data_dir(bio) == WRITE) {
+ /* If we silently drop writes, then the VFS layer will cache
+ * the write and persist it in memory. While it doesn't change
+ * the underlying storage, it still may be contrary to the
+ * behavior expected by a verified, read-only device.
+ */
+ DMWARN_LIMIT("write request received. rejecting with -EIO.");
+ verity_error(vc, NULL, -EIO);
+ return -EIO;
+ } else {
+ /* Queue up the request to be verified */
+ io = verity_io_alloc(ti, bio);
+ if (!io) {
+ DMERR_LIMIT("Failed to allocate and init IO data");
+ return DM_MAPIO_REQUEUE;
+ }
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, 0);
+ }
+
+ return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+ *key = strsep(&arg, "=");
+ *val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti: Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload: hashed device
+ * @hashtree: device hashtree is stored on
+ * @hashstart: start address of hashes (default 0)
+ * @block_size: size of a hash block
+ * @alg: hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt: salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae10401 50afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct verity_config *vc = NULL;
+ int ret = 0;
+ sector_t blocks;
+ unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+ const char *payload = NULL;
+ const char *hashtree = NULL;
+ unsigned long hashstart = 0;
+ const char *alg = NULL;
+ const char *root_hexdigest = NULL;
+ const char *dev_error_behavior = error_behavior;
+ const char *hexsalt = "";
+ int i;
+
+ for (i = 0; i < argc; ++i) {
+ char *key, *val;
+ DMWARN("Argument %d: '%s'", i, argv[i]);
+ splitarg(argv[i], &key, &val);
+ if (!key) {
+ DMWARN("Bad argument %d: missing key?", i);
+ break;
+ }
+ if (!val) {
+ DMWARN("Bad argument %d='%s': missing value", i, key);
+ break;
+ }
+
+ if (!strcmp(key, "alg")) {
+ alg = val;
+ } else if (!strcmp(key, "payload")) {
+ payload = val;
+ } else if (!strcmp(key, "hashtree")) {
+ hashtree = val;
+ } else if (!strcmp(key, "root_hexdigest")) {
+ root_hexdigest = val;
+ } else if (!strcmp(key, "hashstart")) {
+ if (strict_strtoul(val, 10, &hashstart)) {
+ ti->error = "Invalid hashstart";
+ return -EINVAL;
+ }
+ } else if (!strcmp(key, "block_size")) {
+ unsigned long tmp;
+ if (strict_strtoul(val, 10, &tmp) ||
+ (tmp > UINT_MAX)) {
+ ti->error = "Invalid block_size";
+ return -EINVAL;
+ }
+ block_size = (unsigned int)tmp;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ } else if (!strcmp(key, "salt")) {
+ hexsalt = val;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ }
+ }
+
+#define NEEDARG(n)
+ if (!(n)) {
+ ti->error = "Missing argument: " #n;
+ return -EINVAL;
+ }
+
+ NEEDARG(alg);
+ NEEDARG(payload);
+ NEEDARG(hashtree);
+ NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+ /* The device mapper device should be setup read-only */
+ if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+ ti->error = "Must be created readonly.";
+ return -EINVAL;
+ }
+
+ vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+ if (!vc) {
+ /* TODO(wad) if this is called from the setup helper, then we
+ * catch these errors and do a CrOS specific thing. if not, we
+ * need to have this call the error handler.
+ */
+ return -EINVAL;
+ }
+
+ /* Calculate the blocks from the given device size */
+ vc->size = ti->len;
+ blocks = to_bytes(vc->size) / block_size;
+ if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+ DMERR("failed to create required bht");
+ goto bad_bht;
+ }
+ if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+ DMERR("root hexdigest error");
+ goto bad_root_hexdigest;
+ }
+ dm_bht_set_salt(&vc->bht, hexsalt);
+ vc->bht.read_cb = kverityd_bht_read_callback;
+
+ /* payload: device to verify */
+ vc->start = 0; /* TODO: should this support a starting offset? */
+ /* We only ever grab the device in read-only mode. */
+ ret = dm_get_device(ti, payload,
+ dm_table_get_mode(ti->table), &vc->dev);
+ if (ret) {
+ DMERR("Failed to acquire device '%s': %d", payload, ret);
+ ti->error = "Device lookup failed";
+ goto bad_verity_dev;
+ }
+
+ if ((to_bytes(vc->start) % block_size) ||
+ (to_bytes(vc->size) % block_size)) {
+ ti->error = "Device must be block_size divisble/aligned";
+ goto bad_hash_start;
+ }
+
+ vc->hash_start = (sector_t)hashstart;
+
+ /* hashtree: device with hashes.
+ * Note, payload == hashtree is okay as long as the size of
+ * ti->len passed to device mapper does not include
+ * the hashes.
+ */
+ if (dm_get_device(ti, hashtree,
+ dm_table_get_mode(ti->table), &vc->hash_dev)) {
+ ti->error = "Hash device lookup failed";
+ goto bad_hash_dev;
+ }
+
+ /* arg4: cryptographic digest algorithm */
+ if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+ CRYPTO_MAX_ALG_NAME) {
+ ti->error = "Hash algorithm name is too long";
+ goto bad_hash;
+ }
+
+ /* override with optional device-specific error behavior */
+ vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+ if (vc->error_behavior == -1) {
+ ti->error = "Bad error_behavior supplied";
+ goto bad_err_behavior;
+ }
+
+ /* TODO: Maybe issues a request on the io queue for block 0? */
+
+ /* Argument processing is done, setup operational data */
+ /* Pool for dm_verity_io objects */
+ vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+ if (!vc->io_pool) {
+ ti->error = "Cannot allocate verity io mempool";
+ goto bad_slab_pool;
+ }
+
+ /* Allocate the bioset used for request padding */
+ /* TODO(wad) allocate a separate bioset for the first verify maybe */
+ vc->bs = bioset_create(MIN_BIOS, 0);
+ if (!vc->bs) {
+ ti->error = "Cannot allocate verity bioset";
+ goto bad_bs;
+ }
+
+ ti->num_flush_requests = 1;
+ ti->private = vc;
+
+ /* TODO(wad) add device and hash device names */
+ {
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+ hashdev, ULL(vc->bht.sectors), ULL(blocks));
+ }
+ return 0;
+
+bad_bs:
+ mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+ dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+ dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+ kfree(vc); /* hash is not secret so no need to zero */
+ return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+
+ bioset_free(vc->bs);
+ mempool_destroy(vc->io_pool);
+ dm_bht_destroy(&vc->bht);
+ dm_put_device(ti, vc->hash_dev);
+ dm_put_device(ti, vc->dev);
+ kfree(vc);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+ char *result, unsigned int maxlen)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+ unsigned int sz = 0;
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+ dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+ switch (type) {
+ case STATUSTYPE_INFO:
+ break;
+ case STATUSTYPE_TABLE:
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+ vdev,
+ hashdev,
+ ULL(vc->hash_start),
+ vc->bht.depth,
+ vc->hash_alg,
+ hexdigest);
+ break;
+ }
+ return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+ struct bio_vec *biovec, int max_size)
+{
+ struct verity_config *vc = ti->private;
+ struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+ if (!q->merge_bvec_fn)
+ return max_size;
+
+ bvm->bi_bdev = vc->dev->bdev;
+ bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+ /* Optionally, this could just return 0 to stick to single pages. */
+ return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data)
+{
+ struct verity_config *vc = ti->private;
+
+ return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+ struct queue_limits *limits)
+{
+ struct verity_config *vc = ti->private;
+ unsigned int block_size = vc->bht.block_size;
+
+ limits->logical_block_size = block_size;
+ limits->physical_block_size = block_size;
+ blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+ .name = "verity",
+ .version = {0, 1, 0},
+ .module = THIS_MODULE,
+ .ctr = verity_ctr,
+ .dtr = verity_dtr,
+ .map = verity_map,
+ .merge = verity_merge,
+ .status = verity_status,
+ .iterate_devices = verity_iterate_devices,
+ .io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+ int r = -ENOMEM;
+
+ _verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+ if (!_verity_io_pool) {
+ DMERR("failed to allocate pool dm_verity_io");
+ goto bad_io_pool;
+ }
+
+ kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+ if (!kverityd_ioq) {
+ DMERR("failed to create workqueue kverityd_ioq");
+ goto bad_io_queue;
+ }
+
+ kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+ if (!kveritydq) {
+ DMERR("failed to create workqueue kveritydq");
+ goto bad_verify_queue;
+ }
+
+ r = dm_register_target(&verity_target);
+ if (r < 0) {
+ DMERR("register failed %d", r);
+ goto register_failed;
+ }
+
+ DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+ verity_target.version[1], verity_target.version[2]);
+
+ return r;
+
+register_failed:
+ destroy_workqueue(kveritydq);
+bad_verify_queue:
+ destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+ kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+ return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+ destroy_workqueue(kveritydq);
+ destroy_workqueue(kverityd_ioq);
+
+ dm_unregister_target(&verity_target);
+ kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+ int code;
+ int transient; /* Likely to not happen after a reboot */
+ u64 block;
+ const char *message;
+
+ sector_t dev_start;
+ sector_t dev_len;
+ struct block_device *dev;
+
+ sector_t hash_dev_start;
+ sector_t hash_dev_len;
+ struct block_device *hash_dev;
+
+ /* Final behavior after all notifications are completed. */
+ int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+ DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+ DM_VERITY_ERROR_BEHAVIOR_PANIC,
+ DM_VERITY_ERROR_BEHAVIOR_NONE,
+ DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..3a4b432
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <crypto/hash.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128 /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE 32 /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8 /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4 /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2 /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk. Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+ atomic_t state; /* see defines */
+ /* Keeping an extra pointer per entry wastes up to ~33k of
+ * memory if a 1m blocks are used (or 66 on 64-bit arch)
+ */
+ void *io_context; /* Reserve a pointer for use during io */
+ /* data should only be non-NULL if fully populated. */
+ void *nodes; /* The hash data used to verify the children.
+ * Guaranteed to be page-aligned.
+ */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+ struct dm_bht_entry *entries; /* array of entries of tree nodes */
+ unsigned int count; /* number of entries at this level */
+ sector_t sector; /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *, /* external context */
+ sector_t, /* start sector */
+ u8 *, /* destination page */
+ sector_t, /* num sectors */
+ struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up. A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree. For some number of these hashes,
+ * a parent node is created by hashing them. These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+ /* Configured values */
+ int depth; /* Depth of the tree including the root */
+ unsigned int block_count; /* Number of blocks hashed */
+ unsigned int block_size; /* Size of a hash block */
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+ unsigned char salt[DM_BHT_SALT_SIZE];
+
+ /* Computed values */
+ unsigned int node_count; /* Data size (in hashes) for each entry */
+ unsigned int node_count_shift; /* first bit set - 1 */
+ /* There is one per CPU so that verified can be simultaneous. */
+ struct shash_desc *hash_desc[NR_CPUS]; /* Container for the hash alg */
+ unsigned int digest_size;
+ sector_t sectors; /* Number of disk sectors used */
+
+ /* bool verified; Full tree is verified */
+ u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_level *levels; /* in reverse order */
+ /* Callback for reading from the hash device */
+ dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+ unsigned int block_count,
+ unsigned int block_size,
+ const char *alg_name);
+/* Destructor for struct dm_bht instances. Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+ unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+ int depth)
+{
+ return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index. At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+ int depth,
+ unsigned int leaf)
+{
+ return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+ struct dm_bht_entry *entry,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ unsigned int node_index = index % bht->node_count;
+
+ return entry->nodes + (node_index * bht->digest_size);
+}
+#endif /* __LINUX_DM_BHT_H */
--
1.7.3.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 11-14-2011, 08:22 AM
Steffen Klassert
 
Default dm: verity target

On Fri, Nov 11, 2011 at 09:34:44AM -0800, Mandeep Singh Baines wrote:
>
> +/**
> + * dm_bht_compute_hash: hashes a page of data
> + */
> +static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
> + unsigned int offset, u8 *digest)
> +{
> + struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
> + void *data;
> + int err;

You don't need to have one shash_desc per cpu. As the shash interface is
synchronous, you could either allocate a new shash_desc with every call
to dm_bht_compute_hash, or just place the shash_desc with the context to
the stack of this function. The easiest thing would be to add the
shash transformation direct to dm_bht and then something like

struct {
struct shash_desc desc;
char ctx[crypto_shash_descsize(bht->tfm)];
} hash_desc;

hash_desc.desc.tfm = bmt->tfm;

The intermediate and final digest values are stored to hash_desc.ctx.
This is local, hence it's reentrant.

> +
> + /* Note, this is synchronous. */
> + if (crypto_shash_init(hash_desc)) {
> + DMCRIT("failed to reinitialize crypto hash (proc:%d)",
> + smp_processor_id());
> + return -EINVAL;
> + }
> + data = kmap_atomic(pg);
> + err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
> + kunmap_atomic(data);
> + if (err) {
> + DMCRIT("crypto_hash_update failed");
> + return -EINVAL;
> + }
> + if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
> + DMCRIT("crypto_hash_update failed");
> + return -EINVAL;
> + }
> + if (crypto_shash_final(hash_desc, digest)) {
> + DMCRIT("crypto_hash_final failed");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +

[snip]

> + * TODO(wad): All hash storage memory is pre-allocated and freed once an
> + * entire branch has been verified.
> + */
> +struct dm_bht {
> + /* Configured values */
> + int depth; /* Depth of the tree including the root */
> + unsigned int block_count; /* Number of blocks hashed */
> + unsigned int block_size; /* Size of a hash block */
> + char hash_alg[CRYPTO_MAX_ALG_NAME];
> + unsigned char salt[DM_BHT_SALT_SIZE];
> +
> + /* Computed values */
> + unsigned int node_count; /* Data size (in hashes) for each entry */
> + unsigned int node_count_shift; /* first bit set - 1 */
> + /* There is one per CPU so that verified can be simultaneous. */
> + struct shash_desc *hash_desc[NR_CPUS]; /* Container for the hash alg */

As mentioned above, you don't need to have one shash_desc per cpu.
Replace this with a 'struct crypto_shash *tfm;' and add this transformation
to the local hash_desc.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 11-14-2011, 04:54 PM
Mandeep Singh Baines
 
Default dm: verity target

Steffen Klassert (steffen.klassert@secunet.com) wrote:
> On Fri, Nov 11, 2011 at 09:34:44AM -0800, Mandeep Singh Baines wrote:
> >
> > +/**
> > + * dm_bht_compute_hash: hashes a page of data
> > + */
> > +static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
> > + unsigned int offset, u8 *digest)
> > +{
> > + struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
> > + void *data;
> > + int err;
>
> You don't need to have one shash_desc per cpu. As the shash interface is
> synchronous, you could either allocate a new shash_desc with every call
> to dm_bht_compute_hash, or just place the shash_desc with the context to
> the stack of this function. The easiest thing would be to add the
> shash transformation direct to dm_bht and then something like
>
> struct {
> struct shash_desc desc;
> char ctx[crypto_shash_descsize(bht->tfm)];
> } hash_desc;
>
> hash_desc.desc.tfm = bmt->tfm;
>
> The intermediate and final digest values are stored to hash_desc.ctx.
> This is local, hence it's reentrant.
>

Hi Steffen,

We use a hash per CPU to avoid allocating and freeing a context per
computation. We could allocate the context on the stack as you suggest.
However, we were worried about overflowing the stack. The context can be
quite large (close to 200 bytes for sha512).

We use a hash per CPU because there can be at most one hash in progress
on each CPU at any one time.

Regards,
Mandeep

> > +
> > + /* Note, this is synchronous. */
> > + if (crypto_shash_init(hash_desc)) {
> > + DMCRIT("failed to reinitialize crypto hash (proc:%d)",
> > + smp_processor_id());
> > + return -EINVAL;
> > + }
> > + data = kmap_atomic(pg);
> > + err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
> > + kunmap_atomic(data);
> > + if (err) {
> > + DMCRIT("crypto_hash_update failed");
> > + return -EINVAL;
> > + }
> > + if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
> > + DMCRIT("crypto_hash_update failed");
> > + return -EINVAL;
> > + }
> > + if (crypto_shash_final(hash_desc, digest)) {
> > + DMCRIT("crypto_hash_final failed");
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
>
> [snip]
>
> > + * TODO(wad): All hash storage memory is pre-allocated and freed once an
> > + * entire branch has been verified.
> > + */
> > +struct dm_bht {
> > + /* Configured values */
> > + int depth; /* Depth of the tree including the root */
> > + unsigned int block_count; /* Number of blocks hashed */
> > + unsigned int block_size; /* Size of a hash block */
> > + char hash_alg[CRYPTO_MAX_ALG_NAME];
> > + unsigned char salt[DM_BHT_SALT_SIZE];
> > +
> > + /* Computed values */
> > + unsigned int node_count; /* Data size (in hashes) for each entry */
> > + unsigned int node_count_shift; /* first bit set - 1 */
> > + /* There is one per CPU so that verified can be simultaneous. */
> > + struct shash_desc *hash_desc[NR_CPUS]; /* Container for the hash alg */
>
> As mentioned above, you don't need to have one shash_desc per cpu.
> Replace this with a 'struct crypto_shash *tfm;' and add this transformation
> to the local hash_desc.
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 11-15-2011, 08:19 AM
Steffen Klassert
 
Default dm: verity target

Hi.

On Mon, Nov 14, 2011 at 09:54:23AM -0800, Mandeep Singh Baines wrote:
>
> Hi Steffen,
>
> We use a hash per CPU to avoid allocating and freeing a context per
> computation. We could allocate the context on the stack as you suggest.
> However, we were worried about overflowing the stack. The context can be
> quite large (close to 200 bytes for sha512).

Indeed, but preallocating on percpu basis can be also quite large on
bigsmp machines.

>
> We use a hash per CPU because there can be at most one hash in progress
> on each CPU at any one time.
>

Ok, this is the case because you use a workqueue and this processes
at most one work item on each cpu the time.

On another note, you might want to use ahash here. Most crypto hardware
accelerators need to use asynchronous ciphers. These can't be used if
you use the synchronous interface.

And at the latest when you switch to ahash, you need more than one
preallocated ahash request on each cpu, as there can be many ahash
requests submitted on each cpu

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 11-22-2011, 09:15 PM
Mandeep Singh Baines
 
Default dm: verity target

Hi Steffen,

Sorry for the delayed repsonse.

Steffen Klassert (steffen.klassert@secunet.com) wrote:
> Hi.
>
> On Mon, Nov 14, 2011 at 09:54:23AM -0800, Mandeep Singh Baines wrote:
> >
> > Hi Steffen,
> >
> > We use a hash per CPU to avoid allocating and freeing a context per
> > computation. We could allocate the context on the stack as you suggest.
> > However, we were worried about overflowing the stack. The context can be
> > quite large (close to 200 bytes for sha512).
>
> Indeed, but preallocating on percpu basis can be also quite large on
> bigsmp machines.
>

Yes, you'll consume NUM_CPUS*sizeof(context) but I'm not sure that its
a lot of memory relative to the total amount of memory you'd see on a
big SMP/NUMA system.

> >
> > We use a hash per CPU because there can be at most one hash in progress
> > on each CPU at any one time.
> >
>
> Ok, this is the case because you use a workqueue and this processes
> at most one work item on each cpu the time.
>
> On another note, you might want to use ahash here. Most crypto hardware
> accelerators need to use asynchronous ciphers. These can't be used if
> you use the synchronous interface.
>

We definitely want to look at ahash but I'd like to get the baseline
functionality in and then iterate. Do you think it would be cool
if we deferred looking into ahash until later?

> And at the latest when you switch to ahash, you need more than one
> preallocated ahash request on each cpu, as there can be many ahash
> requests submitted on each cpu
>

Thanks much for all the feedback.

Regards,
Mandeep

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 12-21-2011, 11:36 PM
Mandeep Singh Baines
 
Default dm: verity target

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path. This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: dm-devel@redhat.com
---
Documentation/device-mapper/dm-bht.txt | 59 ++
Documentation/device-mapper/dm-verity.txt | 76 +++
drivers/md/Kconfig | 30 +
drivers/md/Makefile | 2 +
drivers/md/dm-bht.c | 559 +++++++++++++++
drivers/md/dm-verity.c | 1043 +++++++++++++++++++++++++++++
drivers/md/dm-verity.h | 45 ++
include/linux/dm-bht.h | 166 +++++
8 files changed, 1980 insertions(+), 0 deletions(-)
create mode 100644 Documentation/device-mapper/dm-bht.txt
create mode 100644 Documentation/device-mapper/dm-verity.txt
create mode 100644 drivers/md/dm-bht.c
create mode 100644 drivers/md/dm-verity.c
create mode 100644 drivers/md/dm-verity.h
create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation. The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure. Each node in the tree is a cryptographic hash. If it is a leaf
+node, the hash is of some block data on disk. If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node). Each level in
+the tree is concretely made up of dm_bht_entry structs. Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block. The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm. The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+ [ root ]
+ / . . .
+ [entry_0] [entry_1]
+ / . . . . . .
+ [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127]
+ / ... / . . . /
+ blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht. hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks. The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify. This can be done by calling dm_bht_populate(). When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash. Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg>
+ salt=<salt> root_hexagiest=<root hash>
+ [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+ This is the device that is going to be integrity checked. It may be
+ a subset of the full device as specified to dmsetup (start sector and count)
+ It may be specified as a path, like /dev/sdaX, or a device number,
+ <major>:<minor>.
+
+<hash device path>
+ This is the device that that supplies the dm-bht hash data. It may be
+ specified similarly to the device path and may be the same device. If the
+ same device is used, the hash offset should be outside of the dm-verity
+ configured device size.
+
+<alg>
+ The cryptographic hash algorithm used for this device. This should
+ be the name of the algorithm, like "sha1".
+
+<salt>
+ Salt value (in hex).
+
+<root hash>
+ The hexadecimal encoding of the cryptographic hash of all of the
+ neighboring nodes at the first level of the tree. This hash should be
+ trusted as there is no other authenticity beyond this point.
+
+<hash start>
+ Start address of hashes (default 0).
+
+<error behavior>
+ 0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path. This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access. If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail. This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis. This allows for a lightweight hash computation on first read
+into the page cache. Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+ dmsetup create vroot --table
+ "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "
+ "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de97 25a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+ http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
---help---
A target that intermittently fails I/O for debugging purposes.

+config DM_BHT
+ tristate "Block hash tree support"
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ Include support for device-mapper devices to use a block hash
+ tree for managing data integrity checks in a scalable way.
+
+ Targets that use this functionality should include it
+ automatically.
+
+ If unsure, say N.
+
+config DM_VERITY
+ tristate "Verity target support"
+ depends on BLK_DEV_DM
+ select DM_BHT
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ This device-mapper target allows you to create a device that
+ transparently integrity checks the data on it. You'll need to
+ activate the digests you're going to use in the cryptoapi
+ configuration.
+
+ To compile this code as a module, choose M here: the module will
+ be called dm-verity.
+
+ If unsure, say N.
+
endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
+obj-$(CONFIG_DM_BHT) += dm-bht.o
+obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..6eb2be3
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,559 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+ if ((ch >= '0') && (ch <= '9'))
+ return ch - '0';
+ if ((ch >= 'a') && (ch <= 'f'))
+ return ch - 'a' + 10;
+ if ((ch >= 'A') && (ch <= 'F'))
+ return ch - 'A' + 10;
+ return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ sprintf((char *)hex, "%02hhx", (int)*binary);
+ hex += 2;
+ binary++;
+ }
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+ unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ *binary = from_hex(*(hex++));
+ *binary *= 16;
+ *binary += from_hex(*(hex++));
+ binary++;
+ }
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+ u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+ u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+ dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+ dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+ DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+ unsigned int offset, u8 *digest)
+{
+ struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
+ void *data;
+ int err;
+
+ /* Note, this is synchronous. */
+ if (crypto_shash_init(hash_desc)) {
+ DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+ smp_processor_id());
+ return -EINVAL;
+ }
+ data = kmap_atomic(pg);
+ err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+ kunmap_atomic(data);
+ if (err) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_final(hash_desc, digest)) {
+ DMCRIT("crypto_hash_final failed");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+ /* last represents the index of the last digest store in the tree.
+ * By walking the tree with that index, it is possible to compute the
+ * total number of entries at each level.
+ *
+ * Since each entry will contain up to |node_count| nodes of the tree,
+ * it is possible that the last index may not be at the end of a given
+ * entry->nodes. In that case, it is assumed the value is padded.
+ *
+ * Note, we treat both the tree root (1 hash) and the tree leaves
+ * independently from the bht data structures. Logically, the root is
+ * depth=-1 and the block layer level is depth=bht->depth
+ */
+ unsigned int last = bht->block_count;
+ int depth;
+
+ /* check that the largest level->count can't result in an int overflow
+ * on allocation or sector calculation.
+ */
+ if (((last >> bht->node_count_shift) + 1) >
+ UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+ (unsigned int)to_sector(bht->block_size))) {
+ DMCRIT("required entries %u is too large", last + 1);
+ return -EINVAL;
+ }
+
+ /* Track the current sector location for each level so we don't have to
+ * compute it during traversals.
+ */
+ bht->sectors = 0;
+ for (depth = 0; depth < bht->depth; ++depth) {
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+ level->entries = (struct dm_bht_entry *)
+ kcalloc(level->count,
+ sizeof(struct dm_bht_entry),
+ GFP_KERNEL);
+ if (!level->entries) {
+ DMERR("failed to allocate entries for depth %d", depth);
+ return -ENOMEM;
+ }
+ level->sector = bht->sectors;
+ bht->sectors += level->count * to_sector(bht->block_size);
+ }
+
+ return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht: pointer to a dm_bht_create()d bht
+ * @depth: tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name: crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+ unsigned int block_size, const char *alg_name)
+{
+ struct crypto_shash *tfm;
+ int size, cpu, status = 0;
+
+ bht->block_size = block_size;
+ /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+ if ((block_size > PAGE_SIZE) ||
+ (PAGE_SIZE % block_size) ||
+ (to_sector(block_size) == 0))
+ return -EINVAL;
+
+ tfm = crypto_alloc_shash(alg_name, 0, 0);
+ if (IS_ERR(tfm)) {
+ DMERR("failed to allocate crypto hash '%s'", alg_name);
+ return -ENOMEM;
+ }
+ size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+ /* Pre-allocate per-cpu crypto contexts to avoid having to
+ * kmalloc/kfree a context for every hash operation.
+ */
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+ struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+ bht->hash_desc[cpu] = hash_desc;
+ if (!hash_desc) {
+ DMERR("failed to allocate crypto hash contexts");
+ status = -ENOMEM;
+ goto bad_hash_alloc;
+ }
+ hash_desc->tfm = tfm;
+ hash_desc->flags = 0x0;
+ }
+ bht->digest_size = crypto_shash_digestsize(tfm);
+ /* We expect to be able to pack >=2 hashes into a block */
+ if (block_size / bht->digest_size < 2) {
+ DMERR("too few hashes fit in a block");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+ DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Configure the tree */
+ bht->block_count = block_count;
+ if (block_count == 0) {
+ DMERR("block_count must be non-zero");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Each dm_bht_entry->nodes is one block. The node code tracks
+ * how many nodes fit into one entry where a node is a single
+ * hash (message digest).
+ */
+ bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+ /* Round down to the nearest power of two. This makes indexing
+ * into the tree much less painful.
+ */
+ bht->node_count = 1 << bht->node_count_shift;
+
+ /* This is unlikely to happen, but with 64k pages, who knows. */
+ if (bht->node_count > UINT_MAX / bht->digest_size) {
+ DMERR("node_count * hash_len exceeds UINT_MAX!");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+ /* Ensure that we can safely shift by this value. */
+ if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+ DMERR("specified depth and node_count_shift is too large");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Allocate levels. Each level of the tree may have an arbitrary number
+ * of dm_bht_entry structs. Each entry contains node_count nodes.
+ * Each node in the tree is a cryptographic digest of either node_count
+ * nodes on the subsequent level or of a specific block on disk.
+ */
+ bht->levels = (struct dm_bht_level *)
+ kcalloc(bht->depth,
+ sizeof(struct dm_bht_level), GFP_KERNEL);
+ if (!bht->levels) {
+ DMERR("failed to allocate tree levels");
+ status = -ENOMEM;
+ goto bad_level_alloc;
+ }
+
+ bht->read_cb = NULL;
+
+ status = dm_bht_initialize_entries(bht);
+ if (status)
+ goto bad_entries_alloc;
+
+ /* We compute depth such that there is only be 1 block at level 0. */
+ BUG_ON(bht->levels[0].count != 1);
+
+ return 0;
+
+bad_entries_alloc:
+ while (bht->depth-- > 0)
+ kfree(bht->levels[bht->depth].entries);
+ kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+ for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+ crypto_free_shash(tfm);
+ return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry: pointer to the entry that's been loaded
+ * @status: I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+ if (status) {
+ /* TODO(wad) add retry support */
+ DMCRIT("an I/O error occurred while reading entry");
+ atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+ /* entry->nodes will be freed later */
+ return;
+ }
+ BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+ atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ * @pg: page holding the block data
+ * @offset: offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset)
+{
+ int state, depth = bht->depth;
+ u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_entry *entry;
+ void *node;
+
+ do {
+ /* Need to check that the hash of the current block is accurate
+ * in its parent.
+ */
+ entry = dm_bht_get_entry(bht, depth - 1, block);
+ state = atomic_read(&entry->state);
+ /* This call is only safe if all nodes along the path
+ * are already populated (i.e. READY) via dm_bht_populate.
+ */
+ BUG_ON(state < DM_BHT_ENTRY_READY);
+ node = dm_bht_get_node(bht, entry, depth, block);
+
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, node, bht->digest_size))
+ goto mismatch;
+
+ /* Keep the containing block of hashes to be verified in the
+ * next pass.
+ */
+ pg = virt_to_page(entry->nodes);
+ offset = offset_in_page(entry->nodes);
+ } while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+ if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, bht->root_digest, bht->digest_size))
+ goto mismatch;
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ /* Mark path to leaf as verified. */
+ for (depth++; depth < bht->depth; depth++) {
+ entry = dm_bht_get_entry(bht, depth, block);
+ /* At this point, entry can only be in VERIFIED or READY state.
+ * So it is safe to use atomic_set instead of atomic_cmpxchg.
+ */
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ return 0;
+
+mismatch:
+ DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+ depth, block);
+ dm_bht_log_mismatch(bht, node, digest);
+ return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ * block are all ready
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+ int depth;
+
+ for (depth = bht->depth - 1; depth >= 0; depth--) {
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+ return false;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht: pointer to a dm_bht_create()d bht
+ * @ctx: context used for all read_cb calls on this request
+ * @block: specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+ int depth, state;
+
+ BUG_ON(block >= bht->block_count);
+
+ for (depth = bht->depth - 1; depth >= 0; --depth) {
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ state = atomic_cmpxchg(&entry->state,
+ DM_BHT_ENTRY_UNALLOCATED,
+ DM_BHT_ENTRY_PENDING);
+ if (state == DM_BHT_ENTRY_VERIFIED)
+ break;
+ if (state <= DM_BHT_ENTRY_ERROR)
+ goto error_state;
+ if (state != DM_BHT_ENTRY_UNALLOCATED)
+ continue;
+
+ /* Current entry is claimed for allocation and loading */
+ entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+ if (!entry->nodes)
+ goto nomem;
+
+ bht->read_cb(ctx,
+ level->sector + to_sector(index * bht->block_size),
+ entry->nodes, to_sector(bht->block_size), entry);
+ }
+
+ return 0;
+
+error_state:
+ DMCRIT("block %u at depth %d is in an error state", block, depth);
+ return -EPERM;
+
+nomem:
+ DMCRIT("failed to allocate memory for entry->nodes");
+ return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht: pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+ int depth, cpu;
+
+ for (depth = 0; depth < bht->depth; depth++) {
+ struct dm_bht_entry *entry = bht->levels[depth].entries;
+ struct dm_bht_entry *entry_end = entry +
+ bht->levels[depth].count;
+ for (; entry < entry_end; ++entry)
+ kfree(entry->nodes);
+ kfree(bht->levels[depth].entries);
+ }
+ kfree(bht->levels);
+ crypto_free_shash((bht->hash_desc[0])->tfm);
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: array of u8s containing the new digest in binary
+ * Returns non-zero on error. hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+ /* Make sure we have at least the bytes expected */
+ if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+ bht->digest_size * 2) {
+ DMERR("root digest length does not match hash algorithm");
+ return -1;
+ }
+ dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: u8 array of size @available
+ * @available: must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+ if (available < 0 ||
+ ((unsigned int) available) < bht->digest_size * 2 + 1) {
+ DMERR("hexdigest has too few bytes available");
+ return -EINVAL;
+ }
+ dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: salt string, as hex; will be zero-padded or truncated to
+ * DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+ size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+ memset(bht->salt, 0, sizeof(bht->salt));
+ dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+ dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..a9bd0e8
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1043 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ * verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+ "notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+ "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+ VERITY_IOFLAGS_CLONED = 0x1, /* original bio has been cloned */
+};
+
+struct dm_verity_io {
+ struct dm_target *target;
+ struct bio *bio;
+ struct delayed_work work;
+ unsigned int flags;
+
+ int error;
+ atomic_t pending;
+
+ u64 block; /* aligned block index */
+ u64 count; /* aligned count in blocks */
+};
+
+struct verity_config {
+ struct dm_dev *dev;
+ sector_t start;
+ sector_t size;
+
+ struct dm_dev *hash_dev;
+ sector_t hash_start;
+
+ struct dm_bht bht;
+
+ /* Pool required for io contexts */
+ mempool_t *io_pool;
+ /* Pool and bios required for making sure that backing device reads are
+ * in PAGE_SIZE increments.
+ */
+ struct bio_set *bs;
+
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+ int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&verity_error_not ifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifi er);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&verity_error_n otifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_noti fier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+ struct dm_verity_io *io = bio->bi_private;
+ struct verity_config *vc = io->target->private;
+ bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+ int nr_iovecs)
+{
+ return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+ struct bio *bio)
+{
+ struct verity_config *vc = ti->private;
+ sector_t sector = bio->bi_sector - ti->begin;
+ struct dm_verity_io *io;
+
+ io = mempool_alloc(vc->io_pool, GFP_NOIO);
+ if (unlikely(!io))
+ return NULL;
+ io->flags = 0;
+ io->target = ti;
+ io->bio = bio;
+ io->error = 0;
+
+ /* Adjust the sector by the virtual starting sector */
+ io->block = to_bytes(sector) / vc->bht.block_size;
+ io->count = bio->bi_size / vc->bht.block_size;
+
+ atomic_set(&io->pending, 0);
+
+ return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ struct bio *bio = io->bio;
+ struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+ if (!clone)
+ return NULL;
+
+ __bio_clone(clone, bio);
+ clone->bi_private = io;
+ clone->bi_end_io = kverityd_src_io_read_end;
+ clone->bi_bdev = vc->dev->bdev;
+ clone->bi_sector += vc->start - io->target->begin;
+ clone->bi_destructor = dm_verity_bio_destructor;
+
+ return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+ int error)
+{
+ const char *message;
+ int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ dev_t devt = 0;
+ u64 block = ~0;
+ int transient = 1;
+ struct dm_verity_error_state error_state;
+
+ if (vc) {
+ devt = vc->dev->bdev->bd_dev;
+ error_mode = vc->error_behavior;
+ }
+
+ if (io) {
+ io->error = -EIO;
+ block = io->block;
+ }
+
+ switch (error) {
+ case -ENOMEM:
+ message = "out of memory";
+ break;
+ case -EBUSY:
+ message = "pending data seen during verify";
+ break;
+ case -EFAULT:
+ message = "crypto operation failure";
+ break;
+ case -EACCES:
+ message = "integrity failure";
+ /* Image is bad. */
+ transient = 0;
+ break;
+ case -EPERM:
+ message = "hash tree population failure";
+ /* Should be dm-bht specific errors */
+ transient = 0;
+ break;
+ case -EINVAL:
+ message = "unexpected missing/invalid data";
+ /* The device was configured incorrectly - fallback. */
+ transient = 0;
+ break;
+ default:
+ /* Other errors can be passed through as IO errors */
+ message = "unknown or I/O error";
+ return;
+ }
+
+ DMERR_LIMIT("verification failure occurred: %s", message);
+
+ if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+ error_state.code = error;
+ error_state.transient = transient;
+ error_state.block = block;
+ error_state.message = message;
+ error_state.dev_start = vc->start;
+ error_state.dev_len = vc->size;
+ error_state.dev = vc->dev->bdev;
+ error_state.hash_dev_start = vc->hash_start;
+ error_state.hash_dev_len = vc->bht.sectors;
+ error_state.hash_dev = vc->hash_dev->bdev;
+
+ /* Set default fallthrough behavior. */
+ error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+ if (!blocking_notifier_call_chain(
+ &verity_error_notifier, transient, &error_state)) {
+ error_mode = error_state.behavior;
+ }
+ }
+
+ switch (error_mode) {
+ case DM_VERITY_ERROR_BEHAVIOR_EIO:
+ break;
+ case DM_VERITY_ERROR_BEHAVIOR_NONE:
+ if (error != -EIO && io)
+ io->error = 0;
+ break;
+ default:
+ goto do_panic;
+ }
+ return;
+
+do_panic:
+ panic("dm-verity failure: "
+ "device:%u:%u error:%d block:%llu message:%s",
+ MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior: NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+ const char * const *allowed = allowed_error_behaviors;
+ char index = '0';
+
+ for (; *allowed; allowed++, index++)
+ if (!strcmp(*allowed, behavior) || behavior[0] == index)
+ break;
+
+ if (!*allowed)
+ return -1;
+
+ /* Convert to the integer index matching the enum. */
+ return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+
+ if (io->error)
+ verity_error(vc, io, io->error);
+
+ bio_endio(io->bio, io->error);
+ mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block)
+ if (!dm_bht_is_populated(&vc->bht, block))
+ return false;
+
+ return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+ if (!atomic_dec_and_test(&io->pending))
+ goto done;
+
+ if (unlikely(io->error))
+ goto io_error;
+
+ /* I/Os that were pending may now be ready */
+ if (verity_is_bht_populated(io)) {
+ INIT_DELAYED_WORK(&io->work, kverityd_verify);
+ queue_delayed_work(kveritydq, &io->work, 0);
+ } else {
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+ }
+
+done:
+ return;
+
+io_error:
+ verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device. The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+ struct dm_verity_io *io)
+{
+ unsigned int block_size = vc->bht.block_size;
+ struct bio *bio = io->bio;
+ u64 block = io->block;
+ unsigned int idx;
+ int r;
+
+ for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+ struct bio_vec *bv = bio_iovec_idx(bio, idx);
+ unsigned int offset = bv->bv_offset;
+ unsigned int len = bv->bv_len;
+
+ VERITY_BUG_ON(offset % block_size);
+ VERITY_BUG_ON(len % block_size);
+
+ while (len) {
+ r = dm_bht_verify_block(&vc->bht, block,
+ bv->bv_page, offset);
+ if (r)
+ goto bad_return;
+
+ offset += block_size;
+ len -= block_size;
+ block++;
+ cond_resched();
+ }
+ }
+
+ return 0;
+
+bad_return:
+ /* dm_bht functions aren't expected to return errno friendly
+ * values. They are converted here for uniformity.
+ */
+ if (r > 0) {
+ DMERR("Pending data for block %llu seen at verify", ULL(block));
+ r = -EBUSY;
+ } else {
+ DMERR_LIMIT("Block hash does not match!");
+ r = -EACCES;
+ }
+ return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+ struct verity_config *vc = io->target->private;
+
+ io->error = verity_verify(vc, io);
+
+ /* Free up the bio and tag with the return value */
+ verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O. The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+ struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+ struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+ /* Tell the tree to atomically update now that we've populated
+ * the given entry.
+ */
+ dm_bht_read_completed(entry, error);
+
+ /* Clean up for reuse when reading data to be checked */
+ bio->bi_vcnt = 0;
+ bio->bi_io_vec->bv_offset = 0;
+ bio->bi_io_vec->bv_len = 0;
+ bio->bi_io_vec->bv_page = NULL;
+ /* Restore the private data to I/O so the destructor can be shared. */
+ bio->bi_private = (void *) io;
+ bio_put(bio);
+
+ /* We bail but assume the tree has been marked bad. */
+ if (unlikely(error)) {
+ DMERR("Failed to read for sector %llu (%u)",
+ ULL(io->bio->bi_sector), io->bio->bi_size);
+ io->error = error;
+ /* Pass through the error to verity_dec_pending below */
+ }
+ /* When pending = 0, it will transition to reading real data */
+ verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+ sector_t count,
+ struct dm_bht_entry *entry)
+{
+ struct dm_verity_io *io = ctx; /* I/O for this batch */
+ struct verity_config *vc;
+ struct bio *bio;
+
+ vc = io->target->private;
+
+ /* The I/O context is nested inside the entry so that we don't need one
+ * io context per page read.
+ */
+ entry->io_context = ctx;
+
+ /* We should only get page size requests at present. */
+ verity_inc_pending(io);
+ bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+ if (unlikely(!bio)) {
+ DMCRIT("Out of memory at bio_alloc_bioset");
+ dm_bht_read_completed(entry, -ENOMEM);
+ return -ENOMEM;
+ }
+ bio->bi_private = (void *) entry;
+ bio->bi_idx = 0;
+ bio->bi_size = vc->bht.block_size;
+ bio->bi_sector = vc->hash_start + start;
+ bio->bi_bdev = vc->hash_dev->bdev;
+ bio->bi_end_io = kverityd_io_bht_populate_end;
+ bio->bi_rw = REQ_META;
+ /* Only need to free the bio since the page is managed by bht */
+ bio->bi_destructor = dm_verity_bio_destructor;
+ bio->bi_vcnt = 1;
+ bio->bi_io_vec->bv_offset = offset_in_page(dst);
+ bio->bi_io_vec->bv_len = to_bytes(count);
+ /* dst is guaranteed to be a page_pool allocation */
+ bio->bi_io_vec->bv_page = virt_to_page(dst);
+ /* Track that this I/O is in use. There should be no risk of the io
+ * being removed prior since this is called synchronously.
+ */
+ generic_make_request(bio);
+ return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block) {
+ int ret = dm_bht_populate(&vc->bht, io, block);
+
+ if (ret < 0) {
+ /* verity_dec_pending will handle the error case. */
+ io->error = ret;
+ break;
+ }
+ }
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+ struct dm_verity_io *io = clone->bi_private;
+
+ if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+ error = -EIO;
+
+ if (unlikely(error)) {
+ DMERR("Error occurred: %d (%llu, %u)",
+ error, ULL(clone->bi_sector), clone->bi_size);
+ io->error = error;
+ }
+
+ /* Release the clone which just avoids the block layer from
+ * leaving offsets, etc in unexpected states.
+ */
+ bio_put(clone);
+
+ verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+ struct bio *clone;
+
+ /* Check if the read is already issued. */
+ if (io->flags & VERITY_IOFLAGS_CLONED)
+ return;
+
+ io->flags |= VERITY_IOFLAGS_CLONED;
+
+ /* Clone the bio. The block layer may modify the bvec array. */
+ clone = verity_bio_clone(io);
+ if (unlikely(!clone)) {
+ io->error = -ENOMEM;
+ return;
+ }
+
+ verity_inc_pending(io);
+
+ generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+
+ /* Issue requests asynchronously. */
+ verity_inc_pending(io);
+ kverityd_src_io_read(io);
+ kverityd_io_bht_populate(io);
+ verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+ atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+ union map_info *map_context)
+{
+ struct dm_verity_io *io;
+ struct verity_config *vc;
+ struct request_queue *r_queue;
+
+ if (unlikely(!ti)) {
+ DMERR("dm_target was NULL");
+ return -EIO;
+ }
+
+ vc = ti->private;
+ r_queue = bdev_get_queue(vc->dev->bdev);
+
+ if (bio_data_dir(bio) == WRITE) {
+ /* If we silently drop writes, then the VFS layer will cache
+ * the write and persist it in memory. While it doesn't change
+ * the underlying storage, it still may be contrary to the
+ * behavior expected by a verified, read-only device.
+ */
+ DMWARN_LIMIT("write request received. rejecting with -EIO.");
+ verity_error(vc, NULL, -EIO);
+ return -EIO;
+ } else {
+ /* Queue up the request to be verified */
+ io = verity_io_alloc(ti, bio);
+ if (!io) {
+ DMERR_LIMIT("Failed to allocate and init IO data");
+ return DM_MAPIO_REQUEUE;
+ }
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, 0);
+ }
+
+ return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+ *key = strsep(&arg, "=");
+ *val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti: Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload: hashed device
+ * @hashtree: device hashtree is stored on
+ * @hashstart: start address of hashes (default 0)
+ * @block_size: size of a hash block
+ * @alg: hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt: salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae10401 50afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct verity_config *vc = NULL;
+ int ret = 0;
+ sector_t blocks;
+ unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+ const char *payload = NULL;
+ const char *hashtree = NULL;
+ unsigned long hashstart = 0;
+ const char *alg = NULL;
+ const char *root_hexdigest = NULL;
+ const char *dev_error_behavior = error_behavior;
+ const char *hexsalt = "";
+ int i;
+
+ for (i = 0; i < argc; ++i) {
+ char *key, *val;
+ DMWARN("Argument %d: '%s'", i, argv[i]);
+ splitarg(argv[i], &key, &val);
+ if (!key) {
+ DMWARN("Bad argument %d: missing key?", i);
+ break;
+ }
+ if (!val) {
+ DMWARN("Bad argument %d='%s': missing value", i, key);
+ break;
+ }
+
+ if (!strcmp(key, "alg")) {
+ alg = val;
+ } else if (!strcmp(key, "payload")) {
+ payload = val;
+ } else if (!strcmp(key, "hashtree")) {
+ hashtree = val;
+ } else if (!strcmp(key, "root_hexdigest")) {
+ root_hexdigest = val;
+ } else if (!strcmp(key, "hashstart")) {
+ if (strict_strtoul(val, 10, &hashstart)) {
+ ti->error = "Invalid hashstart";
+ return -EINVAL;
+ }
+ } else if (!strcmp(key, "block_size")) {
+ unsigned long tmp;
+ if (strict_strtoul(val, 10, &tmp) ||
+ (tmp > UINT_MAX)) {
+ ti->error = "Invalid block_size";
+ return -EINVAL;
+ }
+ block_size = (unsigned int)tmp;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ } else if (!strcmp(key, "salt")) {
+ hexsalt = val;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ }
+ }
+
+#define NEEDARG(n)
+ if (!(n)) {
+ ti->error = "Missing argument: " #n;
+ return -EINVAL;
+ }
+
+ NEEDARG(alg);
+ NEEDARG(payload);
+ NEEDARG(hashtree);
+ NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+ /* The device mapper device should be setup read-only */
+ if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+ ti->error = "Must be created readonly.";
+ return -EINVAL;
+ }
+
+ vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+ if (!vc) {
+ /* TODO(wad) if this is called from the setup helper, then we
+ * catch these errors and do a CrOS specific thing. if not, we
+ * need to have this call the error handler.
+ */
+ return -EINVAL;
+ }
+
+ /* Calculate the blocks from the given device size */
+ vc->size = ti->len;
+ blocks = to_bytes(vc->size) / block_size;
+ if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+ DMERR("failed to create required bht");
+ goto bad_bht;
+ }
+ if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+ DMERR("root hexdigest error");
+ goto bad_root_hexdigest;
+ }
+ dm_bht_set_salt(&vc->bht, hexsalt);
+ vc->bht.read_cb = kverityd_bht_read_callback;
+
+ /* payload: device to verify */
+ vc->start = 0; /* TODO: should this support a starting offset? */
+ /* We only ever grab the device in read-only mode. */
+ ret = dm_get_device(ti, payload,
+ dm_table_get_mode(ti->table), &vc->dev);
+ if (ret) {
+ DMERR("Failed to acquire device '%s': %d", payload, ret);
+ ti->error = "Device lookup failed";
+ goto bad_verity_dev;
+ }
+
+ if ((to_bytes(vc->start) % block_size) ||
+ (to_bytes(vc->size) % block_size)) {
+ ti->error = "Device must be block_size divisble/aligned";
+ goto bad_hash_start;
+ }
+
+ vc->hash_start = (sector_t)hashstart;
+
+ /* hashtree: device with hashes.
+ * Note, payload == hashtree is okay as long as the size of
+ * ti->len passed to device mapper does not include
+ * the hashes.
+ */
+ if (dm_get_device(ti, hashtree,
+ dm_table_get_mode(ti->table), &vc->hash_dev)) {
+ ti->error = "Hash device lookup failed";
+ goto bad_hash_dev;
+ }
+
+ /* arg4: cryptographic digest algorithm */
+ if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+ CRYPTO_MAX_ALG_NAME) {
+ ti->error = "Hash algorithm name is too long";
+ goto bad_hash;
+ }
+
+ /* override with optional device-specific error behavior */
+ vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+ if (vc->error_behavior == -1) {
+ ti->error = "Bad error_behavior supplied";
+ goto bad_err_behavior;
+ }
+
+ /* TODO: Maybe issues a request on the io queue for block 0? */
+
+ /* Argument processing is done, setup operational data */
+ /* Pool for dm_verity_io objects */
+ vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+ if (!vc->io_pool) {
+ ti->error = "Cannot allocate verity io mempool";
+ goto bad_slab_pool;
+ }
+
+ /* Allocate the bioset used for request padding */
+ /* TODO(wad) allocate a separate bioset for the first verify maybe */
+ vc->bs = bioset_create(MIN_BIOS, 0);
+ if (!vc->bs) {
+ ti->error = "Cannot allocate verity bioset";
+ goto bad_bs;
+ }
+
+ ti->num_flush_requests = 1;
+ ti->private = vc;
+
+ /* TODO(wad) add device and hash device names */
+ {
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+ hashdev, ULL(vc->bht.sectors), ULL(blocks));
+ }
+ return 0;
+
+bad_bs:
+ mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+ dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+ dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+ kfree(vc); /* hash is not secret so no need to zero */
+ return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+
+ bioset_free(vc->bs);
+ mempool_destroy(vc->io_pool);
+ dm_bht_destroy(&vc->bht);
+ dm_put_device(ti, vc->hash_dev);
+ dm_put_device(ti, vc->dev);
+ kfree(vc);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+ char *result, unsigned int maxlen)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+ unsigned int sz = 0;
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+ dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+ switch (type) {
+ case STATUSTYPE_INFO:
+ break;
+ case STATUSTYPE_TABLE:
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+ vdev,
+ hashdev,
+ ULL(vc->hash_start),
+ vc->bht.depth,
+ vc->hash_alg,
+ hexdigest);
+ break;
+ }
+ return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+ struct bio_vec *biovec, int max_size)
+{
+ struct verity_config *vc = ti->private;
+ struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+ if (!q->merge_bvec_fn)
+ return max_size;
+
+ bvm->bi_bdev = vc->dev->bdev;
+ bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+ /* Optionally, this could just return 0 to stick to single pages. */
+ return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data)
+{
+ struct verity_config *vc = ti->private;
+
+ return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+ struct queue_limits *limits)
+{
+ struct verity_config *vc = ti->private;
+ unsigned int block_size = vc->bht.block_size;
+
+ limits->logical_block_size = block_size;
+ limits->physical_block_size = block_size;
+ blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+ .name = "verity",
+ .version = {0, 1, 0},
+ .module = THIS_MODULE,
+ .ctr = verity_ctr,
+ .dtr = verity_dtr,
+ .map = verity_map,
+ .merge = verity_merge,
+ .status = verity_status,
+ .iterate_devices = verity_iterate_devices,
+ .io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+ int r = -ENOMEM;
+
+ _verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+ if (!_verity_io_pool) {
+ DMERR("failed to allocate pool dm_verity_io");
+ goto bad_io_pool;
+ }
+
+ kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+ if (!kverityd_ioq) {
+ DMERR("failed to create workqueue kverityd_ioq");
+ goto bad_io_queue;
+ }
+
+ kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+ if (!kveritydq) {
+ DMERR("failed to create workqueue kveritydq");
+ goto bad_verify_queue;
+ }
+
+ r = dm_register_target(&verity_target);
+ if (r < 0) {
+ DMERR("register failed %d", r);
+ goto register_failed;
+ }
+
+ DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+ verity_target.version[1], verity_target.version[2]);
+
+ return r;
+
+register_failed:
+ destroy_workqueue(kveritydq);
+bad_verify_queue:
+ destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+ kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+ return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+ destroy_workqueue(kveritydq);
+ destroy_workqueue(kverityd_ioq);
+
+ dm_unregister_target(&verity_target);
+ kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+ int code;
+ int transient; /* Likely to not happen after a reboot */
+ u64 block;
+ const char *message;
+
+ sector_t dev_start;
+ sector_t dev_len;
+ struct block_device *dev;
+
+ sector_t hash_dev_start;
+ sector_t hash_dev_len;
+ struct block_device *hash_dev;
+
+ /* Final behavior after all notifications are completed. */
+ int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+ DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+ DM_VERITY_ERROR_BEHAVIOR_PANIC,
+ DM_VERITY_ERROR_BEHAVIOR_NONE,
+ DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..3a4b432
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <crypto/hash.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128 /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE 32 /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8 /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4 /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2 /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk. Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+ atomic_t state; /* see defines */
+ /* Keeping an extra pointer per entry wastes up to ~33k of
+ * memory if a 1m blocks are used (or 66 on 64-bit arch)
+ */
+ void *io_context; /* Reserve a pointer for use during io */
+ /* data should only be non-NULL if fully populated. */
+ void *nodes; /* The hash data used to verify the children.
+ * Guaranteed to be page-aligned.
+ */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+ struct dm_bht_entry *entries; /* array of entries of tree nodes */
+ unsigned int count; /* number of entries at this level */
+ sector_t sector; /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *, /* external context */
+ sector_t, /* start sector */
+ u8 *, /* destination page */
+ sector_t, /* num sectors */
+ struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up. A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree. For some number of these hashes,
+ * a parent node is created by hashing them. These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+ /* Configured values */
+ int depth; /* Depth of the tree including the root */
+ unsigned int block_count; /* Number of blocks hashed */
+ unsigned int block_size; /* Size of a hash block */
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+ unsigned char salt[DM_BHT_SALT_SIZE];
+
+ /* Computed values */
+ unsigned int node_count; /* Data size (in hashes) for each entry */
+ unsigned int node_count_shift; /* first bit set - 1 */
+ /* There is one per CPU so that verified can be simultaneous. */
+ struct shash_desc *hash_desc[NR_CPUS]; /* Container for the hash alg */
+ unsigned int digest_size;
+ sector_t sectors; /* Number of disk sectors used */
+
+ /* bool verified; Full tree is verified */
+ u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_level *levels; /* in reverse order */
+ /* Callback for reading from the hash device */
+ dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+ unsigned int block_count,
+ unsigned int block_size,
+ const char *alg_name);
+/* Destructor for struct dm_bht instances. Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+ unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+ int depth)
+{
+ return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index. At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+ int depth,
+ unsigned int leaf)
+{
+ return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+ struct dm_bht_entry *entry,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ unsigned int node_index = index % bht->node_count;
+
+ return entry->nodes + (node_index * bht->digest_size);
+}
+#endif /* __LINUX_DM_BHT_H */
--
1.7.3.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-04-2012, 08:49 PM
Mandeep Singh Baines
 
Default dm: verity target

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path. This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V3:
* Discussion over irc (Alasdair G Kergon)
* Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
* Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: dm-devel@redhat.com
---
Documentation/device-mapper/dm-bht.txt | 59 ++
Documentation/device-mapper/dm-verity.txt | 76 +++
drivers/md/Kconfig | 30 +
drivers/md/Makefile | 2 +
drivers/md/dm-bht.c | 559 +++++++++++++++
drivers/md/dm-verity.c | 1043 +++++++++++++++++++++++++++++
drivers/md/dm-verity.h | 45 ++
include/linux/dm-bht.h | 166 +++++
8 files changed, 1980 insertions(+), 0 deletions(-)
create mode 100644 Documentation/device-mapper/dm-bht.txt
create mode 100644 Documentation/device-mapper/dm-verity.txt
create mode 100644 drivers/md/dm-bht.c
create mode 100644 drivers/md/dm-verity.c
create mode 100644 drivers/md/dm-verity.h
create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation. The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure. Each node in the tree is a cryptographic hash. If it is a leaf
+node, the hash is of some block data on disk. If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node). Each level in
+the tree is concretely made up of dm_bht_entry structs. Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block. The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm. The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+ [ root ]
+ / . . .
+ [entry_0] [entry_1]
+ / . . . . . .
+ [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127]
+ / ... / . . . /
+ blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht. hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks. The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify. This can be done by calling dm_bht_populate(). When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash. Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg>
+ salt=<salt> root_hexagiest=<root hash>
+ [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+ This is the device that is going to be integrity checked. It may be
+ a subset of the full device as specified to dmsetup (start sector and count)
+ It may be specified as a path, like /dev/sdaX, or a device number,
+ <major>:<minor>.
+
+<hash device path>
+ This is the device that that supplies the dm-bht hash data. It may be
+ specified similarly to the device path and may be the same device. If the
+ same device is used, the hash offset should be outside of the dm-verity
+ configured device size.
+
+<alg>
+ The cryptographic hash algorithm used for this device. This should
+ be the name of the algorithm, like "sha1".
+
+<salt>
+ Salt value (in hex).
+
+<root hash>
+ The hexadecimal encoding of the cryptographic hash of all of the
+ neighboring nodes at the first level of the tree. This hash should be
+ trusted as there is no other authenticity beyond this point.
+
+<hash start>
+ Start address of hashes (default 0).
+
+<error behavior>
+ 0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path. This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access. If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail. This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis. This allows for a lightweight hash computation on first read
+into the page cache. Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+ dmsetup create vroot --table
+ "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "
+ "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de97 25a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+ http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
---help---
A target that intermittently fails I/O for debugging purposes.

+config DM_BHT
+ tristate "Block hash tree support"
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ Include support for device-mapper devices to use a block hash
+ tree for managing data integrity checks in a scalable way.
+
+ Targets that use this functionality should include it
+ automatically.
+
+ If unsure, say N.
+
+config DM_VERITY
+ tristate "Verity target support"
+ depends on BLK_DEV_DM
+ select DM_BHT
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ This device-mapper target allows you to create a device that
+ transparently integrity checks the data on it. You'll need to
+ activate the digests you're going to use in the cryptoapi
+ configuration.
+
+ To compile this code as a module, choose M here: the module will
+ be called dm-verity.
+
+ If unsure, say N.
+
endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
+obj-$(CONFIG_DM_BHT) += dm-bht.o
+obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..6eb2be3
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,559 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+ if ((ch >= '0') && (ch <= '9'))
+ return ch - '0';
+ if ((ch >= 'a') && (ch <= 'f'))
+ return ch - 'a' + 10;
+ if ((ch >= 'A') && (ch <= 'F'))
+ return ch - 'A' + 10;
+ return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ sprintf((char *)hex, "%02hhx", (int)*binary);
+ hex += 2;
+ binary++;
+ }
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+ unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ *binary = from_hex(*(hex++));
+ *binary *= 16;
+ *binary += from_hex(*(hex++));
+ binary++;
+ }
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+ u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+ u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+ dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+ dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+ DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+ unsigned int offset, u8 *digest)
+{
+ struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
+ void *data;
+ int err;
+
+ /* Note, this is synchronous. */
+ if (crypto_shash_init(hash_desc)) {
+ DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+ smp_processor_id());
+ return -EINVAL;
+ }
+ data = kmap_atomic(pg);
+ err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+ kunmap_atomic(data);
+ if (err) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_final(hash_desc, digest)) {
+ DMCRIT("crypto_hash_final failed");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+ /* last represents the index of the last digest store in the tree.
+ * By walking the tree with that index, it is possible to compute the
+ * total number of entries at each level.
+ *
+ * Since each entry will contain up to |node_count| nodes of the tree,
+ * it is possible that the last index may not be at the end of a given
+ * entry->nodes. In that case, it is assumed the value is padded.
+ *
+ * Note, we treat both the tree root (1 hash) and the tree leaves
+ * independently from the bht data structures. Logically, the root is
+ * depth=-1 and the block layer level is depth=bht->depth
+ */
+ unsigned int last = bht->block_count;
+ int depth;
+
+ /* check that the largest level->count can't result in an int overflow
+ * on allocation or sector calculation.
+ */
+ if (((last >> bht->node_count_shift) + 1) >
+ UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+ (unsigned int)to_sector(bht->block_size))) {
+ DMCRIT("required entries %u is too large", last + 1);
+ return -EINVAL;
+ }
+
+ /* Track the current sector location for each level so we don't have to
+ * compute it during traversals.
+ */
+ bht->sectors = 0;
+ for (depth = 0; depth < bht->depth; ++depth) {
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+ level->entries = (struct dm_bht_entry *)
+ kcalloc(level->count,
+ sizeof(struct dm_bht_entry),
+ GFP_KERNEL);
+ if (!level->entries) {
+ DMERR("failed to allocate entries for depth %d", depth);
+ return -ENOMEM;
+ }
+ level->sector = bht->sectors;
+ bht->sectors += level->count * to_sector(bht->block_size);
+ }
+
+ return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht: pointer to a dm_bht_create()d bht
+ * @depth: tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name: crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+ unsigned int block_size, const char *alg_name)
+{
+ struct crypto_shash *tfm;
+ int size, cpu, status = 0;
+
+ bht->block_size = block_size;
+ /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+ if ((block_size > PAGE_SIZE) ||
+ (PAGE_SIZE % block_size) ||
+ (to_sector(block_size) == 0))
+ return -EINVAL;
+
+ tfm = crypto_alloc_shash(alg_name, 0, 0);
+ if (IS_ERR(tfm)) {
+ DMERR("failed to allocate crypto hash '%s'", alg_name);
+ return -ENOMEM;
+ }
+ size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+ /* Pre-allocate per-cpu crypto contexts to avoid having to
+ * kmalloc/kfree a context for every hash operation.
+ */
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+ struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+ bht->hash_desc[cpu] = hash_desc;
+ if (!hash_desc) {
+ DMERR("failed to allocate crypto hash contexts");
+ status = -ENOMEM;
+ goto bad_hash_alloc;
+ }
+ hash_desc->tfm = tfm;
+ hash_desc->flags = 0x0;
+ }
+ bht->digest_size = crypto_shash_digestsize(tfm);
+ /* We expect to be able to pack >=2 hashes into a block */
+ if (block_size / bht->digest_size < 2) {
+ DMERR("too few hashes fit in a block");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+ DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Configure the tree */
+ bht->block_count = block_count;
+ if (block_count == 0) {
+ DMERR("block_count must be non-zero");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Each dm_bht_entry->nodes is one block. The node code tracks
+ * how many nodes fit into one entry where a node is a single
+ * hash (message digest).
+ */
+ bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+ /* Round down to the nearest power of two. This makes indexing
+ * into the tree much less painful.
+ */
+ bht->node_count = 1 << bht->node_count_shift;
+
+ /* This is unlikely to happen, but with 64k pages, who knows. */
+ if (bht->node_count > UINT_MAX / bht->digest_size) {
+ DMERR("node_count * hash_len exceeds UINT_MAX!");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+ /* Ensure that we can safely shift by this value. */
+ if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+ DMERR("specified depth and node_count_shift is too large");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Allocate levels. Each level of the tree may have an arbitrary number
+ * of dm_bht_entry structs. Each entry contains node_count nodes.
+ * Each node in the tree is a cryptographic digest of either node_count
+ * nodes on the subsequent level or of a specific block on disk.
+ */
+ bht->levels = (struct dm_bht_level *)
+ kcalloc(bht->depth,
+ sizeof(struct dm_bht_level), GFP_KERNEL);
+ if (!bht->levels) {
+ DMERR("failed to allocate tree levels");
+ status = -ENOMEM;
+ goto bad_level_alloc;
+ }
+
+ bht->read_cb = NULL;
+
+ status = dm_bht_initialize_entries(bht);
+ if (status)
+ goto bad_entries_alloc;
+
+ /* We compute depth such that there is only be 1 block at level 0. */
+ BUG_ON(bht->levels[0].count != 1);
+
+ return 0;
+
+bad_entries_alloc:
+ while (bht->depth-- > 0)
+ kfree(bht->levels[bht->depth].entries);
+ kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+ for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+ crypto_free_shash(tfm);
+ return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry: pointer to the entry that's been loaded
+ * @status: I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+ if (status) {
+ /* TODO(wad) add retry support */
+ DMCRIT("an I/O error occurred while reading entry");
+ atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+ /* entry->nodes will be freed later */
+ return;
+ }
+ BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+ atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ * @pg: page holding the block data
+ * @offset: offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset)
+{
+ int state, depth = bht->depth;
+ u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_entry *entry;
+ void *node;
+
+ do {
+ /* Need to check that the hash of the current block is accurate
+ * in its parent.
+ */
+ entry = dm_bht_get_entry(bht, depth - 1, block);
+ state = atomic_read(&entry->state);
+ /* This call is only safe if all nodes along the path
+ * are already populated (i.e. READY) via dm_bht_populate.
+ */
+ BUG_ON(state < DM_BHT_ENTRY_READY);
+ node = dm_bht_get_node(bht, entry, depth, block);
+
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, node, bht->digest_size))
+ goto mismatch;
+
+ /* Keep the containing block of hashes to be verified in the
+ * next pass.
+ */
+ pg = virt_to_page(entry->nodes);
+ offset = offset_in_page(entry->nodes);
+ } while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+ if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, bht->root_digest, bht->digest_size))
+ goto mismatch;
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ /* Mark path to leaf as verified. */
+ for (depth++; depth < bht->depth; depth++) {
+ entry = dm_bht_get_entry(bht, depth, block);
+ /* At this point, entry can only be in VERIFIED or READY state.
+ * So it is safe to use atomic_set instead of atomic_cmpxchg.
+ */
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ return 0;
+
+mismatch:
+ DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+ depth, block);
+ dm_bht_log_mismatch(bht, node, digest);
+ return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ * block are all ready
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+ int depth;
+
+ for (depth = bht->depth - 1; depth >= 0; depth--) {
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+ return false;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht: pointer to a dm_bht_create()d bht
+ * @ctx: context used for all read_cb calls on this request
+ * @block: specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+ int depth, state;
+
+ BUG_ON(block >= bht->block_count);
+
+ for (depth = bht->depth - 1; depth >= 0; --depth) {
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ state = atomic_cmpxchg(&entry->state,
+ DM_BHT_ENTRY_UNALLOCATED,
+ DM_BHT_ENTRY_PENDING);
+ if (state == DM_BHT_ENTRY_VERIFIED)
+ break;
+ if (state <= DM_BHT_ENTRY_ERROR)
+ goto error_state;
+ if (state != DM_BHT_ENTRY_UNALLOCATED)
+ continue;
+
+ /* Current entry is claimed for allocation and loading */
+ entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+ if (!entry->nodes)
+ goto nomem;
+
+ bht->read_cb(ctx,
+ level->sector + to_sector(index * bht->block_size),
+ entry->nodes, to_sector(bht->block_size), entry);
+ }
+
+ return 0;
+
+error_state:
+ DMCRIT("block %u at depth %d is in an error state", block, depth);
+ return -EPERM;
+
+nomem:
+ DMCRIT("failed to allocate memory for entry->nodes");
+ return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht: pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+ int depth, cpu;
+
+ for (depth = 0; depth < bht->depth; depth++) {
+ struct dm_bht_entry *entry = bht->levels[depth].entries;
+ struct dm_bht_entry *entry_end = entry +
+ bht->levels[depth].count;
+ for (; entry < entry_end; ++entry)
+ kfree(entry->nodes);
+ kfree(bht->levels[depth].entries);
+ }
+ kfree(bht->levels);
+ crypto_free_shash((bht->hash_desc[0])->tfm);
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: array of u8s containing the new digest in binary
+ * Returns non-zero on error. hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+ /* Make sure we have at least the bytes expected */
+ if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+ bht->digest_size * 2) {
+ DMERR("root digest length does not match hash algorithm");
+ return -1;
+ }
+ dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: u8 array of size @available
+ * @available: must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+ if (available < 0 ||
+ ((unsigned int) available) < bht->digest_size * 2 + 1) {
+ DMERR("hexdigest has too few bytes available");
+ return -EINVAL;
+ }
+ dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: salt string, as hex; will be zero-padded or truncated to
+ * DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+ size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+ memset(bht->salt, 0, sizeof(bht->salt));
+ dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+ dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..a9bd0e8
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1043 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ * verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+ "notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+ "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+ VERITY_IOFLAGS_CLONED = 0x1, /* original bio has been cloned */
+};
+
+struct dm_verity_io {
+ struct dm_target *target;
+ struct bio *bio;
+ struct delayed_work work;
+ unsigned int flags;
+
+ int error;
+ atomic_t pending;
+
+ u64 block; /* aligned block index */
+ u64 count; /* aligned count in blocks */
+};
+
+struct verity_config {
+ struct dm_dev *dev;
+ sector_t start;
+ sector_t size;
+
+ struct dm_dev *hash_dev;
+ sector_t hash_start;
+
+ struct dm_bht bht;
+
+ /* Pool required for io contexts */
+ mempool_t *io_pool;
+ /* Pool and bios required for making sure that backing device reads are
+ * in PAGE_SIZE increments.
+ */
+ struct bio_set *bs;
+
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+ int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&verity_error_not ifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifi er);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&verity_error_n otifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_noti fier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+ struct dm_verity_io *io = bio->bi_private;
+ struct verity_config *vc = io->target->private;
+ bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+ int nr_iovecs)
+{
+ return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+ struct bio *bio)
+{
+ struct verity_config *vc = ti->private;
+ sector_t sector = bio->bi_sector - ti->begin;
+ struct dm_verity_io *io;
+
+ io = mempool_alloc(vc->io_pool, GFP_NOIO);
+ if (unlikely(!io))
+ return NULL;
+ io->flags = 0;
+ io->target = ti;
+ io->bio = bio;
+ io->error = 0;
+
+ /* Adjust the sector by the virtual starting sector */
+ io->block = to_bytes(sector) / vc->bht.block_size;
+ io->count = bio->bi_size / vc->bht.block_size;
+
+ atomic_set(&io->pending, 0);
+
+ return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ struct bio *bio = io->bio;
+ struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+ if (!clone)
+ return NULL;
+
+ __bio_clone(clone, bio);
+ clone->bi_private = io;
+ clone->bi_end_io = kverityd_src_io_read_end;
+ clone->bi_bdev = vc->dev->bdev;
+ clone->bi_sector += vc->start - io->target->begin;
+ clone->bi_destructor = dm_verity_bio_destructor;
+
+ return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+ int error)
+{
+ const char *message;
+ int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ dev_t devt = 0;
+ u64 block = ~0;
+ int transient = 1;
+ struct dm_verity_error_state error_state;
+
+ if (vc) {
+ devt = vc->dev->bdev->bd_dev;
+ error_mode = vc->error_behavior;
+ }
+
+ if (io) {
+ io->error = -EIO;
+ block = io->block;
+ }
+
+ switch (error) {
+ case -ENOMEM:
+ message = "out of memory";
+ break;
+ case -EBUSY:
+ message = "pending data seen during verify";
+ break;
+ case -EFAULT:
+ message = "crypto operation failure";
+ break;
+ case -EACCES:
+ message = "integrity failure";
+ /* Image is bad. */
+ transient = 0;
+ break;
+ case -EPERM:
+ message = "hash tree population failure";
+ /* Should be dm-bht specific errors */
+ transient = 0;
+ break;
+ case -EINVAL:
+ message = "unexpected missing/invalid data";
+ /* The device was configured incorrectly - fallback. */
+ transient = 0;
+ break;
+ default:
+ /* Other errors can be passed through as IO errors */
+ message = "unknown or I/O error";
+ return;
+ }
+
+ DMERR_LIMIT("verification failure occurred: %s", message);
+
+ if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+ error_state.code = error;
+ error_state.transient = transient;
+ error_state.block = block;
+ error_state.message = message;
+ error_state.dev_start = vc->start;
+ error_state.dev_len = vc->size;
+ error_state.dev = vc->dev->bdev;
+ error_state.hash_dev_start = vc->hash_start;
+ error_state.hash_dev_len = vc->bht.sectors;
+ error_state.hash_dev = vc->hash_dev->bdev;
+
+ /* Set default fallthrough behavior. */
+ error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+ if (!blocking_notifier_call_chain(
+ &verity_error_notifier, transient, &error_state)) {
+ error_mode = error_state.behavior;
+ }
+ }
+
+ switch (error_mode) {
+ case DM_VERITY_ERROR_BEHAVIOR_EIO:
+ break;
+ case DM_VERITY_ERROR_BEHAVIOR_NONE:
+ if (error != -EIO && io)
+ io->error = 0;
+ break;
+ default:
+ goto do_panic;
+ }
+ return;
+
+do_panic:
+ panic("dm-verity failure: "
+ "device:%u:%u error:%d block:%llu message:%s",
+ MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior: NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+ const char * const *allowed = allowed_error_behaviors;
+ char index = '0';
+
+ for (; *allowed; allowed++, index++)
+ if (!strcmp(*allowed, behavior) || behavior[0] == index)
+ break;
+
+ if (!*allowed)
+ return -1;
+
+ /* Convert to the integer index matching the enum. */
+ return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+
+ if (io->error)
+ verity_error(vc, io, io->error);
+
+ bio_endio(io->bio, io->error);
+ mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block)
+ if (!dm_bht_is_populated(&vc->bht, block))
+ return false;
+
+ return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+ if (!atomic_dec_and_test(&io->pending))
+ goto done;
+
+ if (unlikely(io->error))
+ goto io_error;
+
+ /* I/Os that were pending may now be ready */
+ if (verity_is_bht_populated(io)) {
+ INIT_DELAYED_WORK(&io->work, kverityd_verify);
+ queue_delayed_work(kveritydq, &io->work, 0);
+ } else {
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+ }
+
+done:
+ return;
+
+io_error:
+ verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device. The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+ struct dm_verity_io *io)
+{
+ unsigned int block_size = vc->bht.block_size;
+ struct bio *bio = io->bio;
+ u64 block = io->block;
+ unsigned int idx;
+ int r;
+
+ for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+ struct bio_vec *bv = bio_iovec_idx(bio, idx);
+ unsigned int offset = bv->bv_offset;
+ unsigned int len = bv->bv_len;
+
+ VERITY_BUG_ON(offset % block_size);
+ VERITY_BUG_ON(len % block_size);
+
+ while (len) {
+ r = dm_bht_verify_block(&vc->bht, block,
+ bv->bv_page, offset);
+ if (r)
+ goto bad_return;
+
+ offset += block_size;
+ len -= block_size;
+ block++;
+ cond_resched();
+ }
+ }
+
+ return 0;
+
+bad_return:
+ /* dm_bht functions aren't expected to return errno friendly
+ * values. They are converted here for uniformity.
+ */
+ if (r > 0) {
+ DMERR("Pending data for block %llu seen at verify", ULL(block));
+ r = -EBUSY;
+ } else {
+ DMERR_LIMIT("Block hash does not match!");
+ r = -EACCES;
+ }
+ return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+ struct verity_config *vc = io->target->private;
+
+ io->error = verity_verify(vc, io);
+
+ /* Free up the bio and tag with the return value */
+ verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O. The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+ struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+ struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+ /* Tell the tree to atomically update now that we've populated
+ * the given entry.
+ */
+ dm_bht_read_completed(entry, error);
+
+ /* Clean up for reuse when reading data to be checked */
+ bio->bi_vcnt = 0;
+ bio->bi_io_vec->bv_offset = 0;
+ bio->bi_io_vec->bv_len = 0;
+ bio->bi_io_vec->bv_page = NULL;
+ /* Restore the private data to I/O so the destructor can be shared. */
+ bio->bi_private = (void *) io;
+ bio_put(bio);
+
+ /* We bail but assume the tree has been marked bad. */
+ if (unlikely(error)) {
+ DMERR("Failed to read for sector %llu (%u)",
+ ULL(io->bio->bi_sector), io->bio->bi_size);
+ io->error = error;
+ /* Pass through the error to verity_dec_pending below */
+ }
+ /* When pending = 0, it will transition to reading real data */
+ verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+ sector_t count,
+ struct dm_bht_entry *entry)
+{
+ struct dm_verity_io *io = ctx; /* I/O for this batch */
+ struct verity_config *vc;
+ struct bio *bio;
+
+ vc = io->target->private;
+
+ /* The I/O context is nested inside the entry so that we don't need one
+ * io context per page read.
+ */
+ entry->io_context = ctx;
+
+ /* We should only get page size requests at present. */
+ verity_inc_pending(io);
+ bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+ if (unlikely(!bio)) {
+ DMCRIT("Out of memory at bio_alloc_bioset");
+ dm_bht_read_completed(entry, -ENOMEM);
+ return -ENOMEM;
+ }
+ bio->bi_private = (void *) entry;
+ bio->bi_idx = 0;
+ bio->bi_size = vc->bht.block_size;
+ bio->bi_sector = vc->hash_start + start;
+ bio->bi_bdev = vc->hash_dev->bdev;
+ bio->bi_end_io = kverityd_io_bht_populate_end;
+ bio->bi_rw = REQ_META;
+ /* Only need to free the bio since the page is managed by bht */
+ bio->bi_destructor = dm_verity_bio_destructor;
+ bio->bi_vcnt = 1;
+ bio->bi_io_vec->bv_offset = offset_in_page(dst);
+ bio->bi_io_vec->bv_len = to_bytes(count);
+ /* dst is guaranteed to be a page_pool allocation */
+ bio->bi_io_vec->bv_page = virt_to_page(dst);
+ /* Track that this I/O is in use. There should be no risk of the io
+ * being removed prior since this is called synchronously.
+ */
+ generic_make_request(bio);
+ return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block) {
+ int ret = dm_bht_populate(&vc->bht, io, block);
+
+ if (ret < 0) {
+ /* verity_dec_pending will handle the error case. */
+ io->error = ret;
+ break;
+ }
+ }
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+ struct dm_verity_io *io = clone->bi_private;
+
+ if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+ error = -EIO;
+
+ if (unlikely(error)) {
+ DMERR("Error occurred: %d (%llu, %u)",
+ error, ULL(clone->bi_sector), clone->bi_size);
+ io->error = error;
+ }
+
+ /* Release the clone which just avoids the block layer from
+ * leaving offsets, etc in unexpected states.
+ */
+ bio_put(clone);
+
+ verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+ struct bio *clone;
+
+ /* Check if the read is already issued. */
+ if (io->flags & VERITY_IOFLAGS_CLONED)
+ return;
+
+ io->flags |= VERITY_IOFLAGS_CLONED;
+
+ /* Clone the bio. The block layer may modify the bvec array. */
+ clone = verity_bio_clone(io);
+ if (unlikely(!clone)) {
+ io->error = -ENOMEM;
+ return;
+ }
+
+ verity_inc_pending(io);
+
+ generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+
+ /* Issue requests asynchronously. */
+ verity_inc_pending(io);
+ kverityd_src_io_read(io);
+ kverityd_io_bht_populate(io);
+ verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+ atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+ union map_info *map_context)
+{
+ struct dm_verity_io *io;
+ struct verity_config *vc;
+ struct request_queue *r_queue;
+
+ if (unlikely(!ti)) {
+ DMERR("dm_target was NULL");
+ return -EIO;
+ }
+
+ vc = ti->private;
+ r_queue = bdev_get_queue(vc->dev->bdev);
+
+ if (bio_data_dir(bio) == WRITE) {
+ /* If we silently drop writes, then the VFS layer will cache
+ * the write and persist it in memory. While it doesn't change
+ * the underlying storage, it still may be contrary to the
+ * behavior expected by a verified, read-only device.
+ */
+ DMWARN_LIMIT("write request received. rejecting with -EIO.");
+ verity_error(vc, NULL, -EIO);
+ return -EIO;
+ } else {
+ /* Queue up the request to be verified */
+ io = verity_io_alloc(ti, bio);
+ if (!io) {
+ DMERR_LIMIT("Failed to allocate and init IO data");
+ return DM_MAPIO_REQUEUE;
+ }
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, 0);
+ }
+
+ return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+ *key = strsep(&arg, "=");
+ *val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti: Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload: hashed device
+ * @hashtree: device hashtree is stored on
+ * @hashstart: start address of hashes (default 0)
+ * @block_size: size of a hash block
+ * @alg: hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt: salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae10401 50afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct verity_config *vc = NULL;
+ int ret = 0;
+ sector_t blocks;
+ unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+ const char *payload = NULL;
+ const char *hashtree = NULL;
+ unsigned long hashstart = 0;
+ const char *alg = NULL;
+ const char *root_hexdigest = NULL;
+ const char *dev_error_behavior = error_behavior;
+ const char *hexsalt = "";
+ int i;
+
+ for (i = 0; i < argc; ++i) {
+ char *key, *val;
+ DMWARN("Argument %d: '%s'", i, argv[i]);
+ splitarg(argv[i], &key, &val);
+ if (!key) {
+ DMWARN("Bad argument %d: missing key?", i);
+ break;
+ }
+ if (!val) {
+ DMWARN("Bad argument %d='%s': missing value", i, key);
+ break;
+ }
+
+ if (!strcmp(key, "alg")) {
+ alg = val;
+ } else if (!strcmp(key, "payload")) {
+ payload = val;
+ } else if (!strcmp(key, "hashtree")) {
+ hashtree = val;
+ } else if (!strcmp(key, "root_hexdigest")) {
+ root_hexdigest = val;
+ } else if (!strcmp(key, "hashstart")) {
+ if (strict_strtoul(val, 10, &hashstart)) {
+ ti->error = "Invalid hashstart";
+ return -EINVAL;
+ }
+ } else if (!strcmp(key, "block_size")) {
+ unsigned long tmp;
+ if (strict_strtoul(val, 10, &tmp) ||
+ (tmp > UINT_MAX)) {
+ ti->error = "Invalid block_size";
+ return -EINVAL;
+ }
+ block_size = (unsigned int)tmp;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ } else if (!strcmp(key, "salt")) {
+ hexsalt = val;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ }
+ }
+
+#define NEEDARG(n)
+ if (!(n)) {
+ ti->error = "Missing argument: " #n;
+ return -EINVAL;
+ }
+
+ NEEDARG(alg);
+ NEEDARG(payload);
+ NEEDARG(hashtree);
+ NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+ /* The device mapper device should be setup read-only */
+ if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+ ti->error = "Must be created readonly.";
+ return -EINVAL;
+ }
+
+ vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+ if (!vc) {
+ /* TODO(wad) if this is called from the setup helper, then we
+ * catch these errors and do a CrOS specific thing. if not, we
+ * need to have this call the error handler.
+ */
+ return -EINVAL;
+ }
+
+ /* Calculate the blocks from the given device size */
+ vc->size = ti->len;
+ blocks = to_bytes(vc->size) / block_size;
+ if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+ DMERR("failed to create required bht");
+ goto bad_bht;
+ }
+ if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+ DMERR("root hexdigest error");
+ goto bad_root_hexdigest;
+ }
+ dm_bht_set_salt(&vc->bht, hexsalt);
+ vc->bht.read_cb = kverityd_bht_read_callback;
+
+ /* payload: device to verify */
+ vc->start = 0; /* TODO: should this support a starting offset? */
+ /* We only ever grab the device in read-only mode. */
+ ret = dm_get_device(ti, payload,
+ dm_table_get_mode(ti->table), &vc->dev);
+ if (ret) {
+ DMERR("Failed to acquire device '%s': %d", payload, ret);
+ ti->error = "Device lookup failed";
+ goto bad_verity_dev;
+ }
+
+ if ((to_bytes(vc->start) % block_size) ||
+ (to_bytes(vc->size) % block_size)) {
+ ti->error = "Device must be block_size divisble/aligned";
+ goto bad_hash_start;
+ }
+
+ vc->hash_start = (sector_t)hashstart;
+
+ /* hashtree: device with hashes.
+ * Note, payload == hashtree is okay as long as the size of
+ * ti->len passed to device mapper does not include
+ * the hashes.
+ */
+ if (dm_get_device(ti, hashtree,
+ dm_table_get_mode(ti->table), &vc->hash_dev)) {
+ ti->error = "Hash device lookup failed";
+ goto bad_hash_dev;
+ }
+
+ /* arg4: cryptographic digest algorithm */
+ if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+ CRYPTO_MAX_ALG_NAME) {
+ ti->error = "Hash algorithm name is too long";
+ goto bad_hash;
+ }
+
+ /* override with optional device-specific error behavior */
+ vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+ if (vc->error_behavior == -1) {
+ ti->error = "Bad error_behavior supplied";
+ goto bad_err_behavior;
+ }
+
+ /* TODO: Maybe issues a request on the io queue for block 0? */
+
+ /* Argument processing is done, setup operational data */
+ /* Pool for dm_verity_io objects */
+ vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+ if (!vc->io_pool) {
+ ti->error = "Cannot allocate verity io mempool";
+ goto bad_slab_pool;
+ }
+
+ /* Allocate the bioset used for request padding */
+ /* TODO(wad) allocate a separate bioset for the first verify maybe */
+ vc->bs = bioset_create(MIN_BIOS, 0);
+ if (!vc->bs) {
+ ti->error = "Cannot allocate verity bioset";
+ goto bad_bs;
+ }
+
+ ti->num_flush_requests = 1;
+ ti->private = vc;
+
+ /* TODO(wad) add device and hash device names */
+ {
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+ hashdev, ULL(vc->bht.sectors), ULL(blocks));
+ }
+ return 0;
+
+bad_bs:
+ mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+ dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+ dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+ kfree(vc); /* hash is not secret so no need to zero */
+ return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+
+ bioset_free(vc->bs);
+ mempool_destroy(vc->io_pool);
+ dm_bht_destroy(&vc->bht);
+ dm_put_device(ti, vc->hash_dev);
+ dm_put_device(ti, vc->dev);
+ kfree(vc);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+ char *result, unsigned int maxlen)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+ unsigned int sz = 0;
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+ dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+ switch (type) {
+ case STATUSTYPE_INFO:
+ break;
+ case STATUSTYPE_TABLE:
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+ vdev,
+ hashdev,
+ ULL(vc->hash_start),
+ vc->bht.depth,
+ vc->hash_alg,
+ hexdigest);
+ break;
+ }
+ return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+ struct bio_vec *biovec, int max_size)
+{
+ struct verity_config *vc = ti->private;
+ struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+ if (!q->merge_bvec_fn)
+ return max_size;
+
+ bvm->bi_bdev = vc->dev->bdev;
+ bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+ /* Optionally, this could just return 0 to stick to single pages. */
+ return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data)
+{
+ struct verity_config *vc = ti->private;
+
+ return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+ struct queue_limits *limits)
+{
+ struct verity_config *vc = ti->private;
+ unsigned int block_size = vc->bht.block_size;
+
+ limits->logical_block_size = block_size;
+ limits->physical_block_size = block_size;
+ blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+ .name = "verity",
+ .version = {0, 1, 0},
+ .module = THIS_MODULE,
+ .ctr = verity_ctr,
+ .dtr = verity_dtr,
+ .map = verity_map,
+ .merge = verity_merge,
+ .status = verity_status,
+ .iterate_devices = verity_iterate_devices,
+ .io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+ int r = -ENOMEM;
+
+ _verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+ if (!_verity_io_pool) {
+ DMERR("failed to allocate pool dm_verity_io");
+ goto bad_io_pool;
+ }
+
+ kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+ if (!kverityd_ioq) {
+ DMERR("failed to create workqueue kverityd_ioq");
+ goto bad_io_queue;
+ }
+
+ kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+ if (!kveritydq) {
+ DMERR("failed to create workqueue kveritydq");
+ goto bad_verify_queue;
+ }
+
+ r = dm_register_target(&verity_target);
+ if (r < 0) {
+ DMERR("register failed %d", r);
+ goto register_failed;
+ }
+
+ DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+ verity_target.version[1], verity_target.version[2]);
+
+ return r;
+
+register_failed:
+ destroy_workqueue(kveritydq);
+bad_verify_queue:
+ destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+ kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+ return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+ destroy_workqueue(kveritydq);
+ destroy_workqueue(kverityd_ioq);
+
+ dm_unregister_target(&verity_target);
+ kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+ int code;
+ int transient; /* Likely to not happen after a reboot */
+ u64 block;
+ const char *message;
+
+ sector_t dev_start;
+ sector_t dev_len;
+ struct block_device *dev;
+
+ sector_t hash_dev_start;
+ sector_t hash_dev_len;
+ struct block_device *hash_dev;
+
+ /* Final behavior after all notifications are completed. */
+ int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+ DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+ DM_VERITY_ERROR_BEHAVIOR_PANIC,
+ DM_VERITY_ERROR_BEHAVIOR_NONE,
+ DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..3a4b432
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <crypto/hash.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128 /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE 32 /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8 /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4 /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2 /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk. Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+ atomic_t state; /* see defines */
+ /* Keeping an extra pointer per entry wastes up to ~33k of
+ * memory if a 1m blocks are used (or 66 on 64-bit arch)
+ */
+ void *io_context; /* Reserve a pointer for use during io */
+ /* data should only be non-NULL if fully populated. */
+ void *nodes; /* The hash data used to verify the children.
+ * Guaranteed to be page-aligned.
+ */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+ struct dm_bht_entry *entries; /* array of entries of tree nodes */
+ unsigned int count; /* number of entries at this level */
+ sector_t sector; /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *, /* external context */
+ sector_t, /* start sector */
+ u8 *, /* destination page */
+ sector_t, /* num sectors */
+ struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up. A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree. For some number of these hashes,
+ * a parent node is created by hashing them. These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+ /* Configured values */
+ int depth; /* Depth of the tree including the root */
+ unsigned int block_count; /* Number of blocks hashed */
+ unsigned int block_size; /* Size of a hash block */
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+ unsigned char salt[DM_BHT_SALT_SIZE];
+
+ /* Computed values */
+ unsigned int node_count; /* Data size (in hashes) for each entry */
+ unsigned int node_count_shift; /* first bit set - 1 */
+ /* There is one per CPU so that verified can be simultaneous. */
+ struct shash_desc *hash_desc[NR_CPUS]; /* Container for the hash alg */
+ unsigned int digest_size;
+ sector_t sectors; /* Number of disk sectors used */
+
+ /* bool verified; Full tree is verified */
+ u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_level *levels; /* in reverse order */
+ /* Callback for reading from the hash device */
+ dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+ unsigned int block_count,
+ unsigned int block_size,
+ const char *alg_name);
+/* Destructor for struct dm_bht instances. Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+ unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+ int depth)
+{
+ return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index. At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+ int depth,
+ unsigned int leaf)
+{
+ return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+ struct dm_bht_entry *entry,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ unsigned int node_index = index % bht->node_count;
+
+ return entry->nodes + (node_index * bht->digest_size);
+}
+#endif /* __LINUX_DM_BHT_H */
--
1.7.3.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-04-2012, 09:42 PM
Mandeep Singh Baines
 
Default dm: verity target

D'oh. Forget to -a when running git commit earlier. Ignore earlier
post and use this instead.

---

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path. This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V3:
* Discussion over irc (Alasdair G Kergon)
* Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
* Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: dm-devel@redhat.com
---
Documentation/device-mapper/dm-bht.txt | 59 ++
Documentation/device-mapper/dm-verity.txt | 76 +++
drivers/md/Kconfig | 30 +
drivers/md/Makefile | 2 +
drivers/md/dm-bht.c | 559 +++++++++++++++
drivers/md/dm-verity.c | 1051 +++++++++++++++++++++++++++++
drivers/md/dm-verity.h | 45 ++
include/linux/dm-bht.h | 166 +++++
8 files changed, 1988 insertions(+), 0 deletions(-)
create mode 100644 Documentation/device-mapper/dm-bht.txt
create mode 100644 Documentation/device-mapper/dm-verity.txt
create mode 100644 drivers/md/dm-bht.c
create mode 100644 drivers/md/dm-verity.c
create mode 100644 drivers/md/dm-verity.h
create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation. The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure. Each node in the tree is a cryptographic hash. If it is a leaf
+node, the hash is of some block data on disk. If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node). Each level in
+the tree is concretely made up of dm_bht_entry structs. Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block. The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm. The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+ [ root ]
+ / . . .
+ [entry_0] [entry_1]
+ / . . . . . .
+ [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127]
+ / ... / . . . /
+ blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht. hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks. The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify. This can be done by calling dm_bht_populate(). When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash. Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg>
+ salt=<salt> root_hexagiest=<root hash>
+ [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+ This is the device that is going to be integrity checked. It may be
+ a subset of the full device as specified to dmsetup (start sector and count)
+ It may be specified as a path, like /dev/sdaX, or a device number,
+ <major>:<minor>.
+
+<hash device path>
+ This is the device that that supplies the dm-bht hash data. It may be
+ specified similarly to the device path and may be the same device. If the
+ same device is used, the hash offset should be outside of the dm-verity
+ configured device size.
+
+<alg>
+ The cryptographic hash algorithm used for this device. This should
+ be the name of the algorithm, like "sha1".
+
+<salt>
+ Salt value (in hex).
+
+<root hash>
+ The hexadecimal encoding of the cryptographic hash of all of the
+ neighboring nodes at the first level of the tree. This hash should be
+ trusted as there is no other authenticity beyond this point.
+
+<hash start>
+ Start address of hashes (default 0).
+
+<error behavior>
+ 0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path. This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access. If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail. This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis. This allows for a lightweight hash computation on first read
+into the page cache. Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+ dmsetup create vroot --table
+ "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "
+ "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de97 25a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+ http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
---help---
A target that intermittently fails I/O for debugging purposes.

+config DM_BHT
+ tristate "Block hash tree support"
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ Include support for device-mapper devices to use a block hash
+ tree for managing data integrity checks in a scalable way.
+
+ Targets that use this functionality should include it
+ automatically.
+
+ If unsure, say N.
+
+config DM_VERITY
+ tristate "Verity target support"
+ depends on BLK_DEV_DM
+ select DM_BHT
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ This device-mapper target allows you to create a device that
+ transparently integrity checks the data on it. You'll need to
+ activate the digests you're going to use in the cryptoapi
+ configuration.
+
+ To compile this code as a module, choose M here: the module will
+ be called dm-verity.
+
+ If unsure, say N.
+
endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
+obj-$(CONFIG_DM_BHT) += dm-bht.o
+obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..6eb2be3
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,559 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+ if ((ch >= '0') && (ch <= '9'))
+ return ch - '0';
+ if ((ch >= 'a') && (ch <= 'f'))
+ return ch - 'a' + 10;
+ if ((ch >= 'A') && (ch <= 'F'))
+ return ch - 'A' + 10;
+ return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ sprintf((char *)hex, "%02hhx", (int)*binary);
+ hex += 2;
+ binary++;
+ }
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary: a byte array of length @binary_len
+ * @hex: a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+ unsigned int binary_len)
+{
+ while (binary_len-- > 0) {
+ *binary = from_hex(*(hex++));
+ *binary *= 16;
+ *binary += from_hex(*(hex++));
+ binary++;
+ }
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+ u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+ u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+ dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+ dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+ DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+ unsigned int offset, u8 *digest)
+{
+ struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
+ void *data;
+ int err;
+
+ /* Note, this is synchronous. */
+ if (crypto_shash_init(hash_desc)) {
+ DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+ smp_processor_id());
+ return -EINVAL;
+ }
+ data = kmap_atomic(pg);
+ err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+ kunmap_atomic(data);
+ if (err) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_final(hash_desc, digest)) {
+ DMCRIT("crypto_hash_final failed");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+ /* last represents the index of the last digest store in the tree.
+ * By walking the tree with that index, it is possible to compute the
+ * total number of entries at each level.
+ *
+ * Since each entry will contain up to |node_count| nodes of the tree,
+ * it is possible that the last index may not be at the end of a given
+ * entry->nodes. In that case, it is assumed the value is padded.
+ *
+ * Note, we treat both the tree root (1 hash) and the tree leaves
+ * independently from the bht data structures. Logically, the root is
+ * depth=-1 and the block layer level is depth=bht->depth
+ */
+ unsigned int last = bht->block_count;
+ int depth;
+
+ /* check that the largest level->count can't result in an int overflow
+ * on allocation or sector calculation.
+ */
+ if (((last >> bht->node_count_shift) + 1) >
+ UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+ (unsigned int)to_sector(bht->block_size))) {
+ DMCRIT("required entries %u is too large", last + 1);
+ return -EINVAL;
+ }
+
+ /* Track the current sector location for each level so we don't have to
+ * compute it during traversals.
+ */
+ bht->sectors = 0;
+ for (depth = 0; depth < bht->depth; ++depth) {
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+ level->entries = (struct dm_bht_entry *)
+ kcalloc(level->count,
+ sizeof(struct dm_bht_entry),
+ GFP_KERNEL);
+ if (!level->entries) {
+ DMERR("failed to allocate entries for depth %d", depth);
+ return -ENOMEM;
+ }
+ level->sector = bht->sectors;
+ bht->sectors += level->count * to_sector(bht->block_size);
+ }
+
+ return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht: pointer to a dm_bht_create()d bht
+ * @depth: tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name: crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+ unsigned int block_size, const char *alg_name)
+{
+ struct crypto_shash *tfm;
+ int size, cpu, status = 0;
+
+ bht->block_size = block_size;
+ /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+ if ((block_size > PAGE_SIZE) ||
+ (PAGE_SIZE % block_size) ||
+ (to_sector(block_size) == 0))
+ return -EINVAL;
+
+ tfm = crypto_alloc_shash(alg_name, 0, 0);
+ if (IS_ERR(tfm)) {
+ DMERR("failed to allocate crypto hash '%s'", alg_name);
+ return -ENOMEM;
+ }
+ size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+ /* Pre-allocate per-cpu crypto contexts to avoid having to
+ * kmalloc/kfree a context for every hash operation.
+ */
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+ struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+ bht->hash_desc[cpu] = hash_desc;
+ if (!hash_desc) {
+ DMERR("failed to allocate crypto hash contexts");
+ status = -ENOMEM;
+ goto bad_hash_alloc;
+ }
+ hash_desc->tfm = tfm;
+ hash_desc->flags = 0x0;
+ }
+ bht->digest_size = crypto_shash_digestsize(tfm);
+ /* We expect to be able to pack >=2 hashes into a block */
+ if (block_size / bht->digest_size < 2) {
+ DMERR("too few hashes fit in a block");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+ DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Configure the tree */
+ bht->block_count = block_count;
+ if (block_count == 0) {
+ DMERR("block_count must be non-zero");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Each dm_bht_entry->nodes is one block. The node code tracks
+ * how many nodes fit into one entry where a node is a single
+ * hash (message digest).
+ */
+ bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+ /* Round down to the nearest power of two. This makes indexing
+ * into the tree much less painful.
+ */
+ bht->node_count = 1 << bht->node_count_shift;
+
+ /* This is unlikely to happen, but with 64k pages, who knows. */
+ if (bht->node_count > UINT_MAX / bht->digest_size) {
+ DMERR("node_count * hash_len exceeds UINT_MAX!");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+ /* Ensure that we can safely shift by this value. */
+ if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+ DMERR("specified depth and node_count_shift is too large");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Allocate levels. Each level of the tree may have an arbitrary number
+ * of dm_bht_entry structs. Each entry contains node_count nodes.
+ * Each node in the tree is a cryptographic digest of either node_count
+ * nodes on the subsequent level or of a specific block on disk.
+ */
+ bht->levels = (struct dm_bht_level *)
+ kcalloc(bht->depth,
+ sizeof(struct dm_bht_level), GFP_KERNEL);
+ if (!bht->levels) {
+ DMERR("failed to allocate tree levels");
+ status = -ENOMEM;
+ goto bad_level_alloc;
+ }
+
+ bht->read_cb = NULL;
+
+ status = dm_bht_initialize_entries(bht);
+ if (status)
+ goto bad_entries_alloc;
+
+ /* We compute depth such that there is only be 1 block at level 0. */
+ BUG_ON(bht->levels[0].count != 1);
+
+ return 0;
+
+bad_entries_alloc:
+ while (bht->depth-- > 0)
+ kfree(bht->levels[bht->depth].entries);
+ kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+ for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+ crypto_free_shash(tfm);
+ return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry: pointer to the entry that's been loaded
+ * @status: I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+ if (status) {
+ /* TODO(wad) add retry support */
+ DMCRIT("an I/O error occurred while reading entry");
+ atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+ /* entry->nodes will be freed later */
+ return;
+ }
+ BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+ atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ * @pg: page holding the block data
+ * @offset: offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset)
+{
+ int state, depth = bht->depth;
+ u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_entry *entry;
+ void *node;
+
+ do {
+ /* Need to check that the hash of the current block is accurate
+ * in its parent.
+ */
+ entry = dm_bht_get_entry(bht, depth - 1, block);
+ state = atomic_read(&entry->state);
+ /* This call is only safe if all nodes along the path
+ * are already populated (i.e. READY) via dm_bht_populate.
+ */
+ BUG_ON(state < DM_BHT_ENTRY_READY);
+ node = dm_bht_get_node(bht, entry, depth, block);
+
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, node, bht->digest_size))
+ goto mismatch;
+
+ /* Keep the containing block of hashes to be verified in the
+ * next pass.
+ */
+ pg = virt_to_page(entry->nodes);
+ offset = offset_in_page(entry->nodes);
+ } while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+ if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+ if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+ memcmp(digest, bht->root_digest, bht->digest_size))
+ goto mismatch;
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ /* Mark path to leaf as verified. */
+ for (depth++; depth < bht->depth; depth++) {
+ entry = dm_bht_get_entry(bht, depth, block);
+ /* At this point, entry can only be in VERIFIED or READY state.
+ * So it is safe to use atomic_set instead of atomic_cmpxchg.
+ */
+ atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+ }
+
+ return 0;
+
+mismatch:
+ DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+ depth, block);
+ dm_bht_log_mismatch(bht, node, digest);
+ return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ * block are all ready
+ * @bht: pointer to a dm_bht_create()d bht
+ * @block: specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+ int depth;
+
+ for (depth = bht->depth - 1; depth >= 0; depth--) {
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+ return false;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht: pointer to a dm_bht_create()d bht
+ * @ctx: context used for all read_cb calls on this request
+ * @block: specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+ int depth, state;
+
+ BUG_ON(block >= bht->block_count);
+
+ for (depth = bht->depth - 1; depth >= 0; --depth) {
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+ struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+ block);
+ state = atomic_cmpxchg(&entry->state,
+ DM_BHT_ENTRY_UNALLOCATED,
+ DM_BHT_ENTRY_PENDING);
+ if (state == DM_BHT_ENTRY_VERIFIED)
+ break;
+ if (state <= DM_BHT_ENTRY_ERROR)
+ goto error_state;
+ if (state != DM_BHT_ENTRY_UNALLOCATED)
+ continue;
+
+ /* Current entry is claimed for allocation and loading */
+ entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+ if (!entry->nodes)
+ goto nomem;
+
+ bht->read_cb(ctx,
+ level->sector + to_sector(index * bht->block_size),
+ entry->nodes, to_sector(bht->block_size), entry);
+ }
+
+ return 0;
+
+error_state:
+ DMCRIT("block %u at depth %d is in an error state", block, depth);
+ return -EPERM;
+
+nomem:
+ DMCRIT("failed to allocate memory for entry->nodes");
+ return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht: pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+ int depth, cpu;
+
+ for (depth = 0; depth < bht->depth; depth++) {
+ struct dm_bht_entry *entry = bht->levels[depth].entries;
+ struct dm_bht_entry *entry_end = entry +
+ bht->levels[depth].count;
+ for (; entry < entry_end; ++entry)
+ kfree(entry->nodes);
+ kfree(bht->levels[depth].entries);
+ }
+ kfree(bht->levels);
+ crypto_free_shash((bht->hash_desc[0])->tfm);
+ for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+ kfree(bht->hash_desc[cpu]);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: array of u8s containing the new digest in binary
+ * Returns non-zero on error. hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+ /* Make sure we have at least the bytes expected */
+ if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+ bht->digest_size * 2) {
+ DMERR("root digest length does not match hash algorithm");
+ return -1;
+ }
+ dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexdigest: u8 array of size @available
+ * @available: must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+ if (available < 0 ||
+ ((unsigned int) available) < bht->digest_size * 2 + 1) {
+ DMERR("hexdigest has too few bytes available");
+ return -EINVAL;
+ }
+ dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: salt string, as hex; will be zero-padded or truncated to
+ * DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+ size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+ memset(bht->salt, 0, sizeof(bht->salt));
+ dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht: pointer to a dm_bht_create()d bht
+ * @hexsalt: buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+ dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+ return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..8f1e8dc
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1051 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ * verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+ "notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+ "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+ VERITY_IOFLAGS_CLONED = 0x1, /* original bio has been cloned */
+};
+
+struct dm_verity_io {
+ struct dm_target *target;
+ struct bio *bio;
+ struct delayed_work work;
+ unsigned int flags;
+
+ int error;
+ atomic_t pending;
+
+ u64 block; /* aligned block index */
+ u64 count; /* aligned count in blocks */
+};
+
+struct verity_config {
+ struct dm_dev *dev;
+ sector_t start;
+ sector_t size;
+
+ struct dm_dev *hash_dev;
+ sector_t hash_start;
+
+ struct dm_bht bht;
+
+ /* Pool required for io contexts */
+ mempool_t *io_pool;
+ /* Pool and bios required for making sure that backing device reads are
+ * in PAGE_SIZE increments.
+ */
+ struct bio_set *bs;
+
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+ int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&verity_error_not ifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifi er);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&verity_error_n otifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_noti fier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+ struct dm_verity_io *io = bio->bi_private;
+ struct verity_config *vc = io->target->private;
+ bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+ int nr_iovecs)
+{
+ return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+ struct bio *bio)
+{
+ struct verity_config *vc = ti->private;
+ sector_t sector = bio->bi_sector - ti->begin;
+ struct dm_verity_io *io;
+
+ io = mempool_alloc(vc->io_pool, GFP_NOIO);
+ if (unlikely(!io))
+ return NULL;
+ io->flags = 0;
+ io->target = ti;
+ io->bio = bio;
+ io->error = 0;
+
+ /* Adjust the sector by the virtual starting sector */
+ io->block = to_bytes(sector) / vc->bht.block_size;
+ io->count = bio->bi_size / vc->bht.block_size;
+
+ atomic_set(&io->pending, 0);
+
+ return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ struct bio *bio = io->bio;
+ struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+ if (!clone)
+ return NULL;
+
+ __bio_clone(clone, bio);
+ clone->bi_private = io;
+ clone->bi_end_io = kverityd_src_io_read_end;
+ clone->bi_bdev = vc->dev->bdev;
+ clone->bi_sector += vc->start - io->target->begin;
+ clone->bi_destructor = dm_verity_bio_destructor;
+
+ return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+ int error)
+{
+ const char *message;
+ int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ dev_t devt = 0;
+ u64 block = ~0;
+ int transient = 1;
+ struct dm_verity_error_state error_state;
+
+ if (vc) {
+ devt = vc->dev->bdev->bd_dev;
+ error_mode = vc->error_behavior;
+ }
+
+ if (io) {
+ io->error = -EIO;
+ block = io->block;
+ }
+
+ switch (error) {
+ case -ENOMEM:
+ message = "out of memory";
+ break;
+ case -EBUSY:
+ message = "pending data seen during verify";
+ break;
+ case -EFAULT:
+ message = "crypto operation failure";
+ break;
+ case -EACCES:
+ message = "integrity failure";
+ /* Image is bad. */
+ transient = 0;
+ break;
+ case -EPERM:
+ message = "hash tree population failure";
+ /* Should be dm-bht specific errors */
+ transient = 0;
+ break;
+ case -EINVAL:
+ message = "unexpected missing/invalid data";
+ /* The device was configured incorrectly - fallback. */
+ transient = 0;
+ break;
+ default:
+ /* Other errors can be passed through as IO errors */
+ message = "unknown or I/O error";
+ return;
+ }
+
+ DMERR_LIMIT("verification failure occurred: %s", message);
+
+ if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+ error_state.code = error;
+ error_state.transient = transient;
+ error_state.block = block;
+ error_state.message = message;
+ error_state.dev_start = vc->start;
+ error_state.dev_len = vc->size;
+ error_state.dev = vc->dev->bdev;
+ error_state.hash_dev_start = vc->hash_start;
+ error_state.hash_dev_len = vc->bht.sectors;
+ error_state.hash_dev = vc->hash_dev->bdev;
+
+ /* Set default fallthrough behavior. */
+ error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+ if (!blocking_notifier_call_chain(
+ &verity_error_notifier, transient, &error_state)) {
+ error_mode = error_state.behavior;
+ }
+ }
+
+ switch (error_mode) {
+ case DM_VERITY_ERROR_BEHAVIOR_EIO:
+ break;
+ case DM_VERITY_ERROR_BEHAVIOR_NONE:
+ if (error != -EIO && io)
+ io->error = 0;
+ break;
+ default:
+ goto do_panic;
+ }
+ return;
+
+do_panic:
+ panic("dm-verity failure: "
+ "device:%u:%u error:%d block:%llu message:%s",
+ MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior: NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+ const char * const *allowed = allowed_error_behaviors;
+ char index = '0';
+
+ for (; *allowed; allowed++, index++)
+ if (!strcmp(*allowed, behavior) || behavior[0] == index)
+ break;
+
+ if (!*allowed)
+ return -1;
+
+ /* Convert to the integer index matching the enum. */
+ return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+
+ if (io->error)
+ verity_error(vc, io, io->error);
+
+ bio_endio(io->bio, io->error);
+ mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block)
+ if (!dm_bht_is_populated(&vc->bht, block))
+ return false;
+
+ return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+ if (!atomic_dec_and_test(&io->pending))
+ goto done;
+
+ if (unlikely(io->error))
+ goto io_error;
+
+ /* I/Os that were pending may now be ready */
+ if (verity_is_bht_populated(io)) {
+ INIT_DELAYED_WORK(&io->work, kverityd_verify);
+ queue_delayed_work(kveritydq, &io->work, 0);
+ } else {
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+ }
+
+done:
+ return;
+
+io_error:
+ verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device. The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+ struct dm_verity_io *io)
+{
+ unsigned int block_size = vc->bht.block_size;
+ struct bio *bio = io->bio;
+ u64 block = io->block;
+ unsigned int idx;
+ int r;
+
+ for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+ struct bio_vec *bv = bio_iovec_idx(bio, idx);
+ unsigned int offset = bv->bv_offset;
+ unsigned int len = bv->bv_len;
+
+ VERITY_BUG_ON(offset % block_size);
+ VERITY_BUG_ON(len % block_size);
+
+ while (len) {
+ r = dm_bht_verify_block(&vc->bht, block,
+ bv->bv_page, offset);
+ if (r)
+ goto bad_return;
+
+ offset += block_size;
+ len -= block_size;
+ block++;
+ cond_resched();
+ }
+ }
+
+ return 0;
+
+bad_return:
+ /* dm_bht functions aren't expected to return errno friendly
+ * values. They are converted here for uniformity.
+ */
+ if (r > 0) {
+ DMERR("Pending data for block %llu seen at verify", ULL(block));
+ r = -EBUSY;
+ } else {
+ DMERR_LIMIT("Block hash does not match!");
+ r = -EACCES;
+ }
+ return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+ struct verity_config *vc = io->target->private;
+
+ io->error = verity_verify(vc, io);
+
+ /* Free up the bio and tag with the return value */
+ verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O. The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+ struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+ struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+ /* Tell the tree to atomically update now that we've populated
+ * the given entry.
+ */
+ dm_bht_read_completed(entry, error);
+
+ /* Clean up for reuse when reading data to be checked */
+ bio->bi_vcnt = 0;
+ bio->bi_io_vec->bv_offset = 0;
+ bio->bi_io_vec->bv_len = 0;
+ bio->bi_io_vec->bv_page = NULL;
+ /* Restore the private data to I/O so the destructor can be shared. */
+ bio->bi_private = (void *) io;
+ bio_put(bio);
+
+ /* We bail but assume the tree has been marked bad. */
+ if (unlikely(error)) {
+ DMERR("Failed to read for sector %llu (%u)",
+ ULL(io->bio->bi_sector), io->bio->bi_size);
+ io->error = error;
+ /* Pass through the error to verity_dec_pending below */
+ }
+ /* When pending = 0, it will transition to reading real data */
+ verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+ sector_t count,
+ struct dm_bht_entry *entry)
+{
+ struct dm_verity_io *io = ctx; /* I/O for this batch */
+ struct verity_config *vc;
+ struct bio *bio;
+
+ vc = io->target->private;
+
+ /* The I/O context is nested inside the entry so that we don't need one
+ * io context per page read.
+ */
+ entry->io_context = ctx;
+
+ /* We should only get page size requests at present. */
+ verity_inc_pending(io);
+ bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+ if (unlikely(!bio)) {
+ DMCRIT("Out of memory at bio_alloc_bioset");
+ dm_bht_read_completed(entry, -ENOMEM);
+ return -ENOMEM;
+ }
+ bio->bi_private = (void *) entry;
+ bio->bi_idx = 0;
+ bio->bi_size = vc->bht.block_size;
+ bio->bi_sector = vc->hash_start + start;
+ bio->bi_bdev = vc->hash_dev->bdev;
+ bio->bi_end_io = kverityd_io_bht_populate_end;
+ bio->bi_rw = REQ_META;
+ /* Only need to free the bio since the page is managed by bht */
+ bio->bi_destructor = dm_verity_bio_destructor;
+ bio->bi_vcnt = 1;
+ bio->bi_io_vec->bv_offset = offset_in_page(dst);
+ bio->bi_io_vec->bv_len = to_bytes(count);
+ /* dst is guaranteed to be a page_pool allocation */
+ bio->bi_io_vec->bv_page = virt_to_page(dst);
+ /* Track that this I/O is in use. There should be no risk of the io
+ * being removed prior since this is called synchronously.
+ */
+ generic_make_request(bio);
+ return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block) {
+ int ret = dm_bht_populate(&vc->bht, io, block);
+
+ if (ret < 0) {
+ /* verity_dec_pending will handle the error case. */
+ io->error = ret;
+ break;
+ }
+ }
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+ struct dm_verity_io *io = clone->bi_private;
+
+ if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+ error = -EIO;
+
+ if (unlikely(error)) {
+ DMERR("Error occurred: %d (%llu, %u)",
+ error, ULL(clone->bi_sector), clone->bi_size);
+ io->error = error;
+ }
+
+ /* Release the clone which just avoids the block layer from
+ * leaving offsets, etc in unexpected states.
+ */
+ bio_put(clone);
+
+ verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+ struct bio *clone;
+
+ /* Check if the read is already issued. */
+ if (io->flags & VERITY_IOFLAGS_CLONED)
+ return;
+
+ io->flags |= VERITY_IOFLAGS_CLONED;
+
+ /* Clone the bio. The block layer may modify the bvec array. */
+ clone = verity_bio_clone(io);
+ if (unlikely(!clone)) {
+ io->error = -ENOMEM;
+ return;
+ }
+
+ verity_inc_pending(io);
+
+ generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+ work);
+
+ /* Issue requests asynchronously. */
+ verity_inc_pending(io);
+ kverityd_src_io_read(io);
+ kverityd_io_bht_populate(io);
+ verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+ atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+ union map_info *map_context)
+{
+ struct dm_verity_io *io;
+ struct verity_config *vc;
+ struct request_queue *r_queue;
+
+ if (unlikely(!ti)) {
+ DMERR("dm_target was NULL");
+ return -EIO;
+ }
+
+ vc = ti->private;
+ r_queue = bdev_get_queue(vc->dev->bdev);
+
+ if (bio_data_dir(bio) == WRITE) {
+ /* If we silently drop writes, then the VFS layer will cache
+ * the write and persist it in memory. While it doesn't change
+ * the underlying storage, it still may be contrary to the
+ * behavior expected by a verified, read-only device.
+ */
+ DMWARN_LIMIT("write request received. rejecting with -EIO.");
+ verity_error(vc, NULL, -EIO);
+ return -EIO;
+ } else {
+ /* Queue up the request to be verified */
+ io = verity_io_alloc(ti, bio);
+ if (!io) {
+ DMERR_LIMIT("Failed to allocate and init IO data");
+ return DM_MAPIO_REQUEUE;
+ }
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, 0);
+ }
+
+ return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+ *key = strsep(&arg, "=");
+ *val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti: Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload: hashed device
+ * @hashtree: device hashtree is stored on
+ * @hashstart: start address of hashes (default 0)
+ * @block_size: size of a hash block
+ * @alg: hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt: salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae10401 50afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct verity_config *vc = NULL;
+ int ret = 0;
+ sector_t blocks;
+ unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+ const char *payload = NULL;
+ const char *hashtree = NULL;
+ unsigned long hashstart = 0;
+ const char *alg = NULL;
+ const char *root_hexdigest = NULL;
+ const char *dev_error_behavior = error_behavior;
+ const char *hexsalt = "";
+ int i;
+
+ for (i = 0; i < argc; ++i) {
+ char *key, *val;
+ DMWARN("Argument %d: '%s'", i, argv[i]);
+ splitarg(argv[i], &key, &val);
+ if (!key) {
+ DMWARN("Bad argument %d: missing key?", i);
+ break;
+ }
+ if (!val) {
+ DMWARN("Bad argument %d='%s': missing value", i, key);
+ break;
+ }
+
+ if (!strcmp(key, "alg")) {
+ alg = val;
+ } else if (!strcmp(key, "payload")) {
+ payload = val;
+ } else if (!strcmp(key, "hashtree")) {
+ hashtree = val;
+ } else if (!strcmp(key, "root_hexdigest")) {
+ root_hexdigest = val;
+ } else if (!strcmp(key, "hashstart")) {
+ if (strict_strtoul(val, 10, &hashstart)) {
+ ti->error = "Invalid hashstart";
+ return -EINVAL;
+ }
+ } else if (!strcmp(key, "block_size")) {
+ unsigned long tmp;
+ if (strict_strtoul(val, 10, &tmp) ||
+ (tmp > UINT_MAX)) {
+ ti->error = "Invalid block_size";
+ return -EINVAL;
+ }
+ block_size = (unsigned int)tmp;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ } else if (!strcmp(key, "salt")) {
+ hexsalt = val;
+ } else if (!strcmp(key, "error_behavior")) {
+ dev_error_behavior = val;
+ }
+ }
+
+#define NEEDARG(n)
+ if (!(n)) {
+ ti->error = "Missing argument: " #n;
+ return -EINVAL;
+ }
+
+ NEEDARG(alg);
+ NEEDARG(payload);
+ NEEDARG(hashtree);
+ NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+ /* The device mapper device should be setup read-only */
+ if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+ ti->error = "Must be created readonly.";
+ return -EINVAL;
+ }
+
+ vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+ if (!vc) {
+ /* TODO(wad) if this is called from the setup helper, then we
+ * catch these errors and do a CrOS specific thing. if not, we
+ * need to have this call the error handler.
+ */
+ return -EINVAL;
+ }
+
+ /* Calculate the blocks from the given device size */
+ vc->size = ti->len;
+ blocks = to_bytes(vc->size) / block_size;
+ if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+ DMERR("failed to create required bht");
+ goto bad_bht;
+ }
+ if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+ DMERR("root hexdigest error");
+ goto bad_root_hexdigest;
+ }
+ dm_bht_set_salt(&vc->bht, hexsalt);
+ vc->bht.read_cb = kverityd_bht_read_callback;
+
+ /* payload: device to verify */
+ vc->start = 0; /* TODO: should this support a starting offset? */
+ /* We only ever grab the device in read-only mode. */
+ ret = dm_get_device(ti, payload,
+ dm_table_get_mode(ti->table), &vc->dev);
+ if (ret) {
+ DMERR("Failed to acquire device '%s': %d", payload, ret);
+ ti->error = "Device lookup failed";
+ goto bad_verity_dev;
+ }
+
+ if ((to_bytes(vc->start) % block_size) ||
+ (to_bytes(vc->size) % block_size)) {
+ ti->error = "Device must be block_size divisble/aligned";
+ goto bad_hash_start;
+ }
+
+ vc->hash_start = (sector_t)hashstart;
+
+ /* hashtree: device with hashes.
+ * Note, payload == hashtree is okay as long as the size of
+ * ti->len passed to device mapper does not include
+ * the hashes.
+ */
+ if (dm_get_device(ti, hashtree,
+ dm_table_get_mode(ti->table), &vc->hash_dev)) {
+ ti->error = "Hash device lookup failed";
+ goto bad_hash_dev;
+ }
+
+ /* arg4: cryptographic digest algorithm */
+ if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+ CRYPTO_MAX_ALG_NAME) {
+ ti->error = "Hash algorithm name is too long";
+ goto bad_hash;
+ }
+
+ /* override with optional device-specific error behavior */
+ vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+ if (vc->error_behavior == -1) {
+ ti->error = "Bad error_behavior supplied";
+ goto bad_err_behavior;
+ }
+
+ /* TODO: Maybe issues a request on the io queue for block 0? */
+
+ /* Argument processing is done, setup operational data */
+ /* Pool for dm_verity_io objects */
+ vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+ if (!vc->io_pool) {
+ ti->error = "Cannot allocate verity io mempool";
+ goto bad_slab_pool;
+ }
+
+ /* Allocate the bioset used for request padding */
+ /* TODO(wad) allocate a separate bioset for the first verify maybe */
+ vc->bs = bioset_create(MIN_BIOS, 0);
+ if (!vc->bs) {
+ ti->error = "Cannot allocate verity bioset";
+ goto bad_bs;
+ }
+
+ ti->num_flush_requests = 1;
+ ti->private = vc;
+
+ /* TODO(wad) add device and hash device names */
+ {
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+ hashdev, ULL(vc->bht.sectors), ULL(blocks));
+ }
+ return 0;
+
+bad_bs:
+ mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+ dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+ dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+ kfree(vc); /* hash is not secret so no need to zero */
+ return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+
+ bioset_free(vc->bs);
+ mempool_destroy(vc->io_pool);
+ dm_bht_destroy(&vc->bht);
+ dm_put_device(ti, vc->hash_dev);
+ dm_put_device(ti, vc->dev);
+ kfree(vc);
+}
+
+static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
+ unsigned long arg)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+ return __blkdev_driver_ioctl(vc->dev->bdev, vc->dev->mode, cmd, arg);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+ char *result, unsigned int maxlen)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+ unsigned int sz = 0;
+ char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+ u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+ dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+ switch (type) {
+ case STATUSTYPE_INFO:
+ break;
+ case STATUSTYPE_TABLE:
+ bdevname(vc->hash_dev->bdev, hashdev);
+ bdevname(vc->dev->bdev, vdev);
+ DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+ vdev,
+ hashdev,
+ ULL(vc->hash_start),
+ vc->bht.depth,
+ vc->hash_alg,
+ hexdigest);
+ break;
+ }
+ return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+ struct bio_vec *biovec, int max_size)
+{
+ struct verity_config *vc = ti->private;
+ struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+ if (!q->merge_bvec_fn)
+ return max_size;
+
+ bvm->bi_bdev = vc->dev->bdev;
+ bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+ /* Optionally, this could just return 0 to stick to single pages. */
+ return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data)
+{
+ struct verity_config *vc = ti->private;
+
+ return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+ struct queue_limits *limits)
+{
+ struct verity_config *vc = ti->private;
+ unsigned int block_size = vc->bht.block_size;
+
+ limits->logical_block_size = block_size;
+ limits->physical_block_size = block_size;
+ blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+ .name = "verity",
+ .version = {0, 1, 0},
+ .module = THIS_MODULE,
+ .ctr = verity_ctr,
+ .dtr = verity_dtr,
+ .ioctl = verity_ioctl,
+ .map = verity_map,
+ .merge = verity_merge,
+ .status = verity_status,
+ .iterate_devices = verity_iterate_devices,
+ .io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+ int r = -ENOMEM;
+
+ _verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+ if (!_verity_io_pool) {
+ DMERR("failed to allocate pool dm_verity_io");
+ goto bad_io_pool;
+ }
+
+ kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+ if (!kverityd_ioq) {
+ DMERR("failed to create workqueue kverityd_ioq");
+ goto bad_io_queue;
+ }
+
+ kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+ if (!kveritydq) {
+ DMERR("failed to create workqueue kveritydq");
+ goto bad_verify_queue;
+ }
+
+ r = dm_register_target(&verity_target);
+ if (r < 0) {
+ DMERR("register failed %d", r);
+ goto register_failed;
+ }
+
+ DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+ verity_target.version[1], verity_target.version[2]);
+
+ return r;
+
+register_failed:
+ destroy_workqueue(kveritydq);
+bad_verify_queue:
+ destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+ kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+ return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+ destroy_workqueue(kveritydq);
+ destroy_workqueue(kverityd_ioq);
+
+ dm_unregister_target(&verity_target);
+ kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+ int code;
+ int transient; /* Likely to not happen after a reboot */
+ u64 block;
+ const char *message;
+
+ sector_t dev_start;
+ sector_t dev_len;
+ struct block_device *dev;
+
+ sector_t hash_dev_start;
+ sector_t hash_dev_len;
+ struct block_device *hash_dev;
+
+ /* Final behavior after all notifications are completed. */
+ int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+ DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+ DM_VERITY_ERROR_BEHAVIOR_PANIC,
+ DM_VERITY_ERROR_BEHAVIOR_NONE,
+ DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..3a4b432
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <crypto/hash.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128 /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE 32 /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8 /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4 /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2 /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk. Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+ atomic_t state; /* see defines */
+ /* Keeping an extra pointer per entry wastes up to ~33k of
+ * memory if a 1m blocks are used (or 66 on 64-bit arch)
+ */
+ void *io_context; /* Reserve a pointer for use during io */
+ /* data should only be non-NULL if fully populated. */
+ void *nodes; /* The hash data used to verify the children.
+ * Guaranteed to be page-aligned.
+ */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+ struct dm_bht_entry *entries; /* array of entries of tree nodes */
+ unsigned int count; /* number of entries at this level */
+ sector_t sector; /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *, /* external context */
+ sector_t, /* start sector */
+ u8 *, /* destination page */
+ sector_t, /* num sectors */
+ struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up. A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree. For some number of these hashes,
+ * a parent node is created by hashing them. These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+ /* Configured values */
+ int depth; /* Depth of the tree including the root */
+ unsigned int block_count; /* Number of blocks hashed */
+ unsigned int block_size; /* Size of a hash block */
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+ unsigned char salt[DM_BHT_SALT_SIZE];
+
+ /* Computed values */
+ unsigned int node_count; /* Data size (in hashes) for each entry */
+ unsigned int node_count_shift; /* first bit set - 1 */
+ /* There is one per CPU so that verified can be simultaneous. */
+ struct shash_desc *hash_desc[NR_CPUS]; /* Container for the hash alg */
+ unsigned int digest_size;
+ sector_t sectors; /* Number of disk sectors used */
+
+ /* bool verified; Full tree is verified */
+ u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+ struct dm_bht_level *levels; /* in reverse order */
+ /* Callback for reading from the hash device */
+ dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+ unsigned int block_count,
+ unsigned int block_size,
+ const char *alg_name);
+/* Destructor for struct dm_bht instances. Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+ unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+ struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+ int depth)
+{
+ return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index. At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+ int depth,
+ unsigned int leaf)
+{
+ return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ struct dm_bht_level *level = &bht->levels[depth];
+
+ return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+ struct dm_bht_entry *entry,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = dm_bht_index_at_level(bht, depth, block);
+ unsigned int node_index = index % bht->node_count;
+
+ return entry->nodes + (node_index * bht->digest_size);
+}
+#endif /* __LINUX_DM_BHT_H */
--
1.7.3.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-28-2012, 09:57 PM
Mandeep Singh Baines
 
Default dm: verity target

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path. This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V4:
* Discussion over phone (Alasdair G Kergon)
* copy _ioctl fix from dm-linear
* verity_status format fixes to match dm conventions
* s/dm-bht/verity_tree
* put everything into dm-verity.c
* ctr changed to dm conventions
* use hex2bin
* use conventional dm names for function
* s/dm_//
* for example: verity_ctr versus dm_verity_ctr
* use per_cpu API
Changes in V3:
* Discussion over irc (Alasdair G Kergon)
* Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
* Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: dm-devel@redhat.com
---
Documentation/device-mapper/verity.txt | 149 ++++
drivers/md/Kconfig | 16 +
drivers/md/Makefile | 1 +
drivers/md/dm-verity.c | 1411 ++++++++++++++++++++++++++++++++
4 files changed, 1577 insertions(+), 0 deletions(-)
create mode 100644 Documentation/device-mapper/verity.txt
create mode 100644 drivers/md/dm-verity.c

diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
new file mode 100644
index 0000000..b631f12
--- /dev/null
+++ b/Documentation/device-mapper/verity.txt
@@ -0,0 +1,149 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters:
+ <version> <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+
+<version>
+ This is the version number of the on-disk format. Currently, there is
+ only version 0.
+
+<dev>
+ This is the device that is going to be integrity checked. It may be
+ a subset of the full device as specified to dmsetup (start sector and count)
+ It may be specified as a path, like /dev/sdaX, or a device number,
+ <major>:<minor>.
+
+<hash_dev>
+ This is the device that that supplies the hash tree data. It may be
+ specified similarly to the device path and may be the same device. If the
+ same device is used, the hash offset should be outside of the dm-verity
+ configured device size.
+
+<hash_start>
+ This is the offset, in 512-byte sectors, from the start of hash_dev to
+ the root block of the hash tree.
+
+<block_size>
+ The size of a hash block. Also, the size of a block to be hashed.
+
+<alg>
+ The cryptographic hash algorithm used for this device. This should
+ be the name of the algorithm, like "sha1".
+
+<digest>
+ The hexadecimal encoding of the cryptographic hash of all of the
+ neighboring nodes at the first level of the tree. This hash should be
+ trusted as there is no other authenticity beyond this point.
+
+<salt>
+ The hexadecimal encoding of the salt value.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path. This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access. If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail. This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis. This allows for a lightweight hash computation on first read
+into the page cache. Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+Hash Tree
+---------
+
+Each node in the tree is a cryptographic hash. If it is a leaf node, the hash
+is of some block data on disk. If it is an intermediary node, then the hash is
+of a number of child nodes.
+
+Each entry in the tree is a collection of neighboring nodes that fit in one
+block. The number is determined based on block_size and the size of the
+selected cryptographic digest algorithm. The hashes are linearly ordered in
+this entry and any unaligned trailing space is ignored but included when
+calculating the parent node.
+
+The tree looks something like:
+
+alg = sha256, num_blocks = 32768, block_size = 4096
+
+ [ root ]
+ / . . .
+ [entry_0] [entry_1]
+ / . . . . . .
+ [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127]
+ / ... / . . . /
+ blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
+
+On-disk format
+==============
+
+Below is the recommended on-disk format. The verity kernel code does not
+read the on-disk header. It only reads the hash blocks which directly
+follow the header. It is expected that a user-space tool will verify the
+integrity of the verity_header and then call dm_setup with the correct
+parameters. Alternatively, the header can be omitted and the dm_setup
+parameters can be passed via the kernel command-line in a rooted chain
+of trust where the command-line is verified.
+
+The on-disk format is especially useful in cases where the hash blocks
+are on a separate partition. The magic number allows easy identification
+of the partition contents. Alternatively, the hash blocks can be stored
+in the same partition as the data to be verified. In such a configuration
+the filesystem on the partition would be sized a little smaller than
+the full-partition, leaving room for the hash blocks.
+
+struct verity_header {
+ uint64_t magic = 0x7665726974790a00;
+ uint32_t version;
+ uint32_t block_size;
+ char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */
+ char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+struct verity_header_block {
+ struct verity_header;
+ char unused[block_size - sizeof(struct verity_header) - sizeof(sig)];
+ char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+Directly following the header are the hash blocks which are stored a depth
+at a time (starting from the root), sorted in order of increasing index.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify. This can be done by calling dm_bht_populate(). When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash. Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
+
+Example
+=======
+
+Setup a device;
+[[
+ dmsetup create vroot --table
+ "0 204800 verity /dev/sda1 /dev/sda2 alg=sha1 "
+ "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de97 25a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+ http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..b8bb690 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,20 @@ config DM_FLAKEY
---help---
A target that intermittently fails I/O for debugging purposes.

+config DM_VERITY
+ tristate "Verity target support"
+ depends on BLK_DEV_DM
+ select CRYPTO
+ select CRYPTO_HASH
+ ---help---
+ This device-mapper target allows you to create a device that
+ transparently integrity checks the data on it. You'll need to
+ activate the digests you're going to use in the cryptoapi
+ configuration.
+
+ To compile this code as a module, choose M here: the module will
+ be called dm-verity.
+
+ If unsure, say N.
+
endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..70a29af 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
+obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..87b7958
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1411 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/module.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+
+
+#define DM_MSG_PREFIX "verity"
+
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define VERITY_MAX_DIGEST_SIZE 64 /* Supports up to 512-bit digests */
+#define VERITY_SALT_SIZE 32 /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define VERITY_TREE_ENTRY_VERIFIED 8 /* 'nodes' checked against parent */
+#define VERITY_TREE_ENTRY_READY 4 /* 'nodes' is loaded and available */
+#define VERITY_TREE_ENTRY_PENDING 2 /* 'nodes' is being loaded */
+#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */
+#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+
+/* verity_tree_entry
+ * Contains verity_tree->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk. Unless verity_tree kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct verity_tree_entry {
+ atomic_t state; /* see defines */
+ /* Keeping an extra pointer per entry wastes up to ~33k of
+ * memory if a 1m blocks are used (or 66 on 64-bit arch)
+ */
+ void *io_context; /* Reserve a pointer for use during io */
+ /* data should only be non-NULL if fully populated. */
+ void *nodes; /* The hash data used to verify the children.
+ * Guaranteed to be page-aligned.
+ */
+};
+
+/* verity_tree_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct verity_tree_level {
+ struct verity_tree_entry *entries; /* array of entries of tree nodes */
+ unsigned int count; /* number of entries at this level */
+ sector_t sector; /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*verity_tree_callback)(void *, /* external context */
+ sector_t, /* start sector */
+ u8 *, /* destination page */
+ sector_t, /* num sectors */
+ struct verity_tree_entry *);
+/* verity_tree - Device mapper block hash tree
+ * verity_tree provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up. A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree. For some number of these hashes,
+ * a parent node is created by hashing them. These steps are
+ * repeated.
+ */
+struct verity_tree {
+ /* Configured values */
+ int depth; /* Depth of the tree including the root */
+ unsigned int block_count; /* Number of blocks hashed */
+ unsigned int block_size; /* Size of a hash block */
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+ u8 salt[VERITY_SALT_SIZE];
+
+ /* Computed values */
+ unsigned int node_count; /* Data size (in hashes) for each entry */
+ unsigned int node_count_shift; /* first bit set - 1 */
+ /*
+ * There is one per CPU so that verified can be simultaneous.
+ * Access through per_cpu_ptr() only
+ */
+ struct shash_desc * __percpu *hash_desc; /* Container for hash alg */
+ unsigned int digest_size;
+ sector_t sectors; /* Number of disk sectors used */
+
+ /* bool verified; Full tree is verified */
+ u8 digest[VERITY_MAX_DIGEST_SIZE];
+ struct verity_tree_level *levels; /* in reverse order */
+ /* Callback for reading from the hash device */
+ verity_tree_callback read_cb;
+};
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+ VERITY_IOFLAGS_CLONED = 0x1, /* original bio has been cloned */
+};
+
+struct verity_io {
+ struct dm_target *target;
+ struct bio *bio;
+ struct delayed_work work;
+ unsigned int flags;
+
+ int error;
+ atomic_t pending;
+
+ u64 block; /* aligned block index */
+ u64 count; /* aligned count in blocks */
+};
+
+struct verity_config {
+ struct dm_dev *dev;
+ sector_t start;
+ sector_t size;
+
+ struct dm_dev *hash_dev;
+ sector_t hash_start;
+
+ struct verity_tree bht;
+
+ /* Pool required for io contexts */
+ mempool_t *io_pool;
+ /* Pool and bios required for making sure that backing device reads are
+ * in PAGE_SIZE increments.
+ */
+ struct bio_set *bs;
+
+ char hash_alg[CRYPTO_MAX_ALG_NAME];
+};
+
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+
+/*
+ * Utilities
+ */
+
+static void bin2hex(char *dst, const u8 *src, size_t count)
+{
+ while (count-- > 0) {
+ sprintf(dst, "%02hhx", (int)*src);
+ dst += 2;
+ src++;
+ }
+}
+
+/*
+ * Verity Tree
+ */
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int verity_tree_get_level_shift(struct verity_tree *bht,
+ int depth)
+{
+ return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index. At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int verity_tree_index_at_level(struct verity_tree *bht,
+ int depth,
+ unsigned int leaf)
+{
+ return leaf >> verity_tree_get_level_shift(bht, depth);
+}
+
+static inline struct verity_tree_entry *verity_tree_get_entry(
+ struct verity_tree *bht,
+ int depth,
+ unsigned int block)
+{
+ unsigned int index = verity_tree_index_at_level(bht, depth, block);
+ struct verity_tree_level *level = &bht->levels[depth];
+
+ return &level->entries[index];
+}
+
+static inline void *verity_tree_get_node(struct verity_tree *bht,
+ struct verity_tree_entry *entry,
+ int depth, unsigned int block)
+{
+ unsigned int index = verity_tree_index_at_level(bht, depth, block);
+ unsigned int node_index = index % bht->node_count;
+
+ return entry->nodes + (node_index * bht->digest_size);
+}
+/**
+ * verity_tree_compute_hash: hashes a page of data
+ */
+static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg,
+ unsigned int offset, u8 *digest)
+{
+ struct shash_desc *hash_desc;
+ void *data;
+ int err;
+
+ hash_desc = *per_cpu_ptr(vt->hash_desc, smp_processor_id());
+
+ if (crypto_shash_init(hash_desc)) {
+ DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+ smp_processor_id());
+ return -EINVAL;
+ }
+ data = kmap_atomic(pg);
+ err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+ kunmap_atomic(data);
+ if (err) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) {
+ DMCRIT("crypto_hash_update failed");
+ return -EINVAL;
+ }
+ if (crypto_shash_final(hash_desc, digest)) {
+ DMCRIT("crypto_hash_final failed");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int verity_tree_initialize_entries(struct verity_tree *vt)
+{
+ /* last represents the index of the last digest store in the tree.
+ * By walking the tree with that index, it is possible to compute the
+ * total number of entries at each level.
+ *
+ * Since each entry will contain up to |node_count| nodes of the tree,
+ * it is possible that the last index may not be at the end of a given
+ * entry->nodes. In that case, it is assumed the value is padded.
+ *
+ * Note, we treat both the tree root (1 hash) and the tree leaves
+ * independently from the vt data structures. Logically, the root is
+ * depth=-1 and the block layer level is depth=vt->depth
+ */
+ unsigned int last = vt->block_count;
+ int depth;
+
+ /* check that the largest level->count can't result in an int overflow
+ * on allocation or sector calculation.
+ */
+ if (((last >> vt->node_count_shift) + 1) >
+ UINT_MAX / max((unsigned int)sizeof(struct verity_tree_entry),
+ (unsigned int)to_sector(vt->block_size))) {
+ DMCRIT("required entries %u is too large", last + 1);
+ return -EINVAL;
+ }
+
+ /* Track the current sector location for each level so we don't have to
+ * compute it during traversals.
+ */
+ vt->sectors = 0;
+ for (depth = 0; depth < vt->depth; ++depth) {
+ struct verity_tree_level *level = &vt->levels[depth];
+
+ level->count = verity_tree_index_at_level(vt, depth, last) + 1;
+ level->entries = (struct verity_tree_entry *)
+ kcalloc(level->count,
+ sizeof(struct verity_tree_entry),
+ GFP_KERNEL);
+ if (!level->entries) {
+ DMERR("failed to allocate entries for depth %d", depth);
+ return -ENOMEM;
+ }
+ level->sector = vt->sectors;
+ vt->sectors += level->count * to_sector(vt->block_size);
+ }
+
+ return 0;
+}
+
+/**
+ * verity_tree_create - prepares @vt for us
+ * @vt: pointer to a verity_tree_create()d vt
+ * @depth: tree depth without the root; including block hashes
+ * @block_count: the number of block hashes / tree leaves
+ * @alg_name: crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+static int verity_tree_create(struct verity_tree *vt, unsigned int block_count,
+ unsigned int block_size, const char *alg_name)
+{
+ struct crypto_shash *tfm;
+ int size, cpu, status = 0;
+
+ vt->block_size = block_size;
+ /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+ if ((block_size > PAGE_SIZE) ||
+ (PAGE_SIZE % block_size) ||
+ (to_sector(block_size) == 0))
+ return -EINVAL;
+
+ tfm = crypto_alloc_shash(alg_name, 0, 0);
+ if (IS_ERR(tfm)) {
+ DMERR("failed to allocate crypto hash '%s'", alg_name);
+ return -ENOMEM;
+ }
+ size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+ vt->hash_desc = alloc_percpu(struct shash_desc *);
+ if (!vt->hash_desc) {
+ DMERR("Failed to allocate per cpu hash_desc");
+ status = -ENOMEM;
+ goto bad_per_cpu;
+ }
+
+ /* Pre-allocate per-cpu crypto contexts to avoid having to
+ * kmalloc/kfree a context for every hash operation.
+ */
+ for_each_possible_cpu(cpu) {
+ struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+ *per_cpu_ptr(vt->hash_desc, cpu) = hash_desc;
+ if (!hash_desc) {
+ DMERR("failed to allocate crypto hash contexts");
+ status = -ENOMEM;
+ goto bad_hash_alloc;
+ }
+ hash_desc->tfm = tfm;
+ hash_desc->flags = 0x0;
+ }
+ vt->digest_size = crypto_shash_digestsize(tfm);
+ /* We expect to be able to pack >=2 hashes into a block */
+ if (block_size / vt->digest_size < 2) {
+ DMERR("too few hashes fit in a block");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
+ DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Configure the tree */
+ vt->block_count = block_count;
+ if (block_count == 0) {
+ DMERR("block_count must be non-zero");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Each verity_tree_entry->nodes is one block. The node code tracks
+ * how many nodes fit into one entry where a node is a single
+ * hash (message digest).
+ */
+ vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
+ /* Round down to the nearest power of two. This makes indexing
+ * into the tree much less painful.
+ */
+ vt->node_count = 1 << vt->node_count_shift;
+
+ /* This is unlikely to happen, but with 64k pages, who knows. */
+ if (vt->node_count > UINT_MAX / vt->digest_size) {
+ DMERR("node_count * hash_len exceeds UINT_MAX!");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ vt->depth = DIV_ROUND_UP(fls(block_count - 1), vt->node_count_shift);
+
+ /* Ensure that we can safely shift by this value. */
+ if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
+ DMERR("specified depth and node_count_shift is too large");
+ status = -EINVAL;
+ goto bad_arg;
+ }
+
+ /* Allocate levels. Each level of the tree may have an arbitrary number
+ * of verity_tree_entry structs. Each entry contains node_count nodes.
+ * Each node in the tree is a cryptographic digest of either node_count
+ * nodes on the subsequent level or of a specific block on disk.
+ */
+ vt->levels = (struct verity_tree_level *)
+ kcalloc(vt->depth,
+ sizeof(struct verity_tree_level), GFP_KERNEL);
+ if (!vt->levels) {
+ DMERR("failed to allocate tree levels");
+ status = -ENOMEM;
+ goto bad_level_alloc;
+ }
+
+ vt->read_cb = NULL;
+
+ status = verity_tree_initialize_entries(vt);
+ if (status)
+ goto bad_entries_alloc;
+
+ /* We compute depth such that there is only be 1 block at level 0. */
+ BUG_ON(vt->levels[0].count != 1);
+
+ return 0;
+
+bad_entries_alloc:
+ while (vt->depth-- > 0)
+ kfree(vt->levels[vt->depth].entries);
+ kfree(vt->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+ for_each_possible_cpu(cpu)
+ if (*per_cpu_ptr(vt->hash_desc, cpu))
+ kfree(*per_cpu_ptr(vt->hash_desc, cpu));
+ free_percpu(vt->hash_desc);
+bad_per_cpu:
+ crypto_free_shash(tfm);
+ return status;
+}
+
+/**
+ * verity_tree_read_completed
+ * @entry: pointer to the entry that's been loaded
+ * @status: I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+static void verity_tree_read_completed(struct verity_tree_entry *entry,
+ int status)
+{
+ if (status) {
+ DMCRIT("an I/O error occurred while reading entry");
+ atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO);
+ return;
+ }
+ BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING);
+ atomic_set(&entry->state, VERITY_TREE_ENTRY_READY);
+}
+
+/**
+ * verity_tree_verify_block - checks that all path nodes for @block are valid
+ * @vt: pointer to a verity_tree_create()d vt
+ * @block: specific block data is expected from
+ * @pg: page holding the block data
+ * @offset: offset into the page
+ *
+ * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error.
+ */
+static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block,
+ struct page *pg, unsigned int offset)
+{
+ int state, depth = vt->depth;
+ u8 digest[VERITY_MAX_DIGEST_SIZE];
+ struct verity_tree_entry *entry;
+ void *node;
+
+ do {
+ /* Need to check that the hash of the current block is accurate
+ * in its parent.
+ */
+ entry = verity_tree_get_entry(vt, depth - 1, block);
+ state = atomic_read(&entry->state);
+ /* This call is only safe if all nodes along the path
+ * are already populated (i.e. READY) via verity_tree_populate.
+ */
+ BUG_ON(state < VERITY_TREE_ENTRY_READY);
+ node = verity_tree_get_node(vt, entry, depth, block);
+
+ if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+ memcmp(digest, node, vt->digest_size))
+ goto mismatch;
+
+ /* Keep the containing block of hashes to be verified in the
+ * next pass.
+ */
+ pg = virt_to_page(entry->nodes);
+ offset = offset_in_page(entry->nodes);
+ } while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED);
+
+ if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) {
+ if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+ memcmp(digest, vt->digest, vt->digest_size))
+ goto mismatch;
+ atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+ }
+
+ /* Mark path to leaf as verified. */
+ for (depth++; depth < vt->depth; depth++) {
+ entry = verity_tree_get_entry(vt, depth, block);
+ /* At this point, entry can only be in VERIFIED or READY state.
+ * So it is safe to use atomic_set instead of atomic_cmpxchg.
+ */
+ atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+ }
+
+ return 0;
+
+mismatch:
+ DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+ depth, block);
+ return VERITY_TREE_ENTRY_ERROR_MISMATCH;
+}
+
+/**
+ * verity_tree_is_populated - check that nodes needed to verify a given
+ * block are all ready
+ * @vt: pointer to a verity_tree_create()d vt
+ * @block: specific block data is expected from
+ *
+ * Callers may wish to call verity_tree_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block)
+{
+ int depth;
+
+ for (depth = vt->depth - 1; depth >= 0; depth--) {
+ struct verity_tree_entry *entry;
+ entry = verity_tree_get_entry(vt, depth, block);
+ if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY)
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * verity_tree_populate - reads entries from disk needed to verify a given block
+ * @vt: pointer to a verity_tree_create()d vt
+ * @ctx: context used for all read_cb calls on this request
+ * @block: specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+static int verity_tree_populate(struct verity_tree *vt, void *ctx,
+ unsigned int block)
+{
+ int depth, state;
+
+ BUG_ON(block >= vt->block_count);
+
+ for (depth = vt->depth - 1; depth >= 0; --depth) {
+ unsigned int index;
+ struct verity_tree_level *level;
+ struct verity_tree_entry *entry;
+
+ index = verity_tree_index_at_level(vt, depth, block);
+ level = &vt->levels[depth];
+ entry = verity_tree_get_entry(vt, depth, block);
+ state = atomic_cmpxchg(&entry->state,
+ VERITY_TREE_ENTRY_UNALLOCATED,
+ VERITY_TREE_ENTRY_PENDING);
+ if (state == VERITY_TREE_ENTRY_VERIFIED)
+ break;
+ if (state <= VERITY_TREE_ENTRY_ERROR)
+ goto error_state;
+ if (state != VERITY_TREE_ENTRY_UNALLOCATED)
+ continue;
+
+ /* Current entry is claimed for allocation and loading */
+ entry->nodes = kmalloc(vt->block_size, GFP_NOIO);
+ if (!entry->nodes)
+ goto nomem;
+
+ vt->read_cb(ctx,
+ level->sector + to_sector(index * vt->block_size),
+ entry->nodes, to_sector(vt->block_size), entry);
+ }
+
+ return 0;
+
+error_state:
+ DMCRIT("block %u at depth %d is in an error state", block, depth);
+ return -EPERM;
+
+nomem:
+ DMCRIT("failed to allocate memory for entry->nodes");
+ return -ENOMEM;
+}
+
+/**
+ * verity_tree_destroy - cleans up all memory used by @vt
+ * @vt: pointer to a verity_tree_create()d vt
+ */
+static void verity_tree_destroy(struct verity_tree *vt)
+{
+ int depth, cpu;
+
+ for (depth = 0; depth < vt->depth; depth++) {
+ struct verity_tree_entry *entry = vt->levels[depth].entries;
+ struct verity_tree_entry *entry_end = entry +
+ vt->levels[depth].count;
+ for (; entry < entry_end; ++entry)
+ kfree(entry->nodes);
+ kfree(vt->levels[depth].entries);
+ }
+ kfree(vt->levels);
+ crypto_free_shash((*per_cpu_ptr(vt->hash_desc, 0))->tfm);
+ for_each_possible_cpu(cpu)
+ kfree(*per_cpu_ptr(vt->hash_desc, cpu));
+}
+
+/*
+ * Verity Tree Accessors
+ */
+
+/**
+ * verity_tree_set_digest - sets an unverified root digest hash from hex
+ * @vt: pointer to a verity_tree_create()d vt
+ * @digest: string containing the digest in hex
+ * Returns non-zero on error.
+ */
+static int verity_tree_set_digest(struct verity_tree *vt, const char *digest)
+{
+ /* Make sure we have at least the bytes expected */
+ if (strnlen((char *)digest, vt->digest_size * 2) !=
+ vt->digest_size * 2) {
+ DMERR("root digest length does not match hash algorithm");
+ return -1;
+ }
+ return hex2bin(vt->digest, digest, vt->digest_size);
+}
+
+/**
+ * verity_tree_digest - returns root digest in hex
+ * @vt: pointer to a verity_tree_create()d vt
+ * @digest: buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_digest(struct verity_tree *vt, char *digest)
+{
+ bin2hex(digest, vt->digest, vt->digest_size);
+ return 0;
+}
+
+/**
+ * verity_tree_set_salt - sets the salt
+ * @vt: pointer to a verity_tree_create()d vt
+ * @salt: string containing the salt in hex
+ * Returns non-zero on error.
+ */
+int verity_tree_set_salt(struct verity_tree *vt, const char *salt)
+{
+ size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt));
+ memset(vt->salt, 0, sizeof(vt->salt));
+ return hex2bin(vt->salt, salt, saltlen);
+}
+
+
+/**
+ * verity_tree_salt - returns the salt in hex
+ * @vt: pointer to a verity_tree_create()d vt
+ * @salt: buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_salt(struct verity_tree *vt, char *salt)
+{
+ bin2hex(salt, vt->salt, sizeof(vt->salt));
+ return 0;
+}
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void verity_bio_destructor(struct bio *bio)
+{
+ struct verity_io *io = bio->bi_private;
+ struct verity_config *vc = io->target->private;
+ bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+ int nr_iovecs)
+{
+ return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct verity_io *verity_io_alloc(struct dm_target *ti,
+ struct bio *bio)
+{
+ struct verity_config *vc = ti->private;
+ sector_t sector = bio->bi_sector - ti->begin;
+ struct verity_io *io;
+
+ io = mempool_alloc(vc->io_pool, GFP_NOIO);
+ if (unlikely(!io))
+ return NULL;
+ io->flags = 0;
+ io->target = ti;
+ io->bio = bio;
+ io->error = 0;
+
+ /* Adjust the sector by the virtual starting sector */
+ io->block = to_bytes(sector) / vc->bht.block_size;
+ io->count = bio->bi_size / vc->bht.block_size;
+
+ atomic_set(&io->pending, 0);
+
+ return io;
+}
+
+static struct bio *verity_bio_clone(struct verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ struct bio *bio = io->bio;
+ struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+ if (!clone)
+ return NULL;
+
+ __bio_clone(clone, bio);
+ clone->bi_private = io;
+ clone->bi_end_io = kverityd_src_io_read_end;
+ clone->bi_bdev = vc->dev->bdev;
+ clone->bi_sector += vc->start - io->target->begin;
+ clone->bi_destructor = verity_bio_destructor;
+
+ return clone;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct verity_io *io);
+
+static void verity_return_bio_to_caller(struct verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+
+ if (io->error)
+ io->error = -EIO;
+
+ bio_endio(io->bio, io->error);
+ mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block)
+ if (!verity_tree_is_populated(&vc->bht, block))
+ return false;
+
+ return true;
+}
+
+/* verity_dec_pending manages the lifetime of all verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct verity_io *io)
+{
+ if (!atomic_dec_and_test(&io->pending))
+ goto done;
+
+ if (unlikely(io->error))
+ goto io_error;
+
+ /* I/Os that were pending may now be ready */
+ if (verity_is_bht_populated(io)) {
+ INIT_DELAYED_WORK(&io->work, kverityd_verify);
+ queue_delayed_work(kveritydq, &io->work, 0);
+ } else {
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+ }
+
+done:
+ return;
+
+io_error:
+ verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device. The computed hash is then passed to verity-tree
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+ struct verity_io *io)
+{
+ unsigned int block_size = vc->bht.block_size;
+ struct bio *bio = io->bio;
+ u64 block = io->block;
+ unsigned int idx;
+ int r;
+
+ for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+ struct bio_vec *bv = bio_iovec_idx(bio, idx);
+ unsigned int offset = bv->bv_offset;
+ unsigned int len = bv->bv_len;
+
+ BUG_ON(offset % block_size);
+ BUG_ON(len % block_size);
+
+ while (len) {
+ r = verity_tree_verify_block(&vc->bht, block,
+ bv->bv_page, offset);
+ if (r)
+ goto bad_return;
+
+ offset += block_size;
+ len -= block_size;
+ block++;
+ cond_resched();
+ }
+ }
+
+ return 0;
+
+bad_return:
+ /* verity_tree functions aren't expected to return errno friendly
+ * values. They are converted here for uniformity.
+ */
+ if (r > 0) {
+ DMERR("Pending data for block %llu seen at verify", ULL(block));
+ r = -EBUSY;
+ } else {
+ DMERR_LIMIT("Block hash does not match!");
+ r = -EACCES;
+ }
+ return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct verity_io *io = container_of(dwork, struct verity_io,
+ work);
+ struct verity_config *vc = io->target->private;
+
+ io->error = verity_verify(vc, io);
+
+ /* Free up the bio and tag with the return value */
+ verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of verity-tree I/O. The status
+ * of the operation is passed back to verity-tree and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+ struct verity_tree_entry *entry;
+ struct verity_io *io;
+
+ entry = (struct verity_tree_entry *) bio->bi_private;
+ io = (struct verity_io *) entry->io_context;
+
+ /* Tell the tree to atomically update now that we've populated
+ * the given entry.
+ */
+ verity_tree_read_completed(entry, error);
+
+ /* Clean up for reuse when reading data to be checked */
+ bio->bi_vcnt = 0;
+ bio->bi_io_vec->bv_offset = 0;
+ bio->bi_io_vec->bv_len = 0;
+ bio->bi_io_vec->bv_page = NULL;
+ /* Restore the private data to I/O so the destructor can be shared. */
+ bio->bi_private = (void *) io;
+ bio_put(bio);
+
+ /* We bail but assume the tree has been marked bad. */
+ if (unlikely(error)) {
+ DMERR("Failed to read for sector %llu (%u)",
+ ULL(io->bio->bi_sector), io->bio->bi_size);
+ io->error = error;
+ /* Pass through the error to verity_dec_pending below */
+ }
+ /* When pending = 0, it will transition to reading real data */
+ verity_dec_pending(io);
+}
+
+/* Called by verity-tree (via verity_tree_populate), this function provides
+ * the message digests to verity-tree that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+ sector_t count,
+ struct verity_tree_entry *entry)
+{
+ struct verity_io *io = ctx; /* I/O for this batch */
+ struct verity_config *vc;
+ struct bio *bio;
+
+ vc = io->target->private;
+
+ /* The I/O context is nested inside the entry so that we don't need one
+ * io context per page read.
+ */
+ entry->io_context = ctx;
+
+ /* We should only get page size requests at present. */
+ verity_inc_pending(io);
+ bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+ if (unlikely(!bio)) {
+ DMCRIT("Out of memory at bio_alloc_bioset");
+ verity_tree_read_completed(entry, -ENOMEM);
+ return -ENOMEM;
+ }
+ bio->bi_private = (void *) entry;
+ bio->bi_idx = 0;
+ bio->bi_size = vc->bht.block_size;
+ bio->bi_sector = vc->hash_start + start;
+ bio->bi_bdev = vc->hash_dev->bdev;
+ bio->bi_end_io = kverityd_io_bht_populate_end;
+ bio->bi_rw = REQ_META;
+ /* Only need to free the bio since the page is managed by bht */
+ bio->bi_destructor = verity_bio_destructor;
+ bio->bi_vcnt = 1;
+ bio->bi_io_vec->bv_offset = offset_in_page(dst);
+ bio->bi_io_vec->bv_len = to_bytes(count);
+ /* dst is guaranteed to be a page_pool allocation */
+ bio->bi_io_vec->bv_page = virt_to_page(dst);
+ /* Track that this I/O is in use. There should be no risk of the io
+ * being removed prior since this is called synchronously.
+ */
+ generic_make_request(bio);
+ return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct verity_io *io)
+{
+ struct verity_config *vc = io->target->private;
+ u64 block;
+
+ for (block = io->block; block < io->block + io->count; ++block) {
+ int ret = verity_tree_populate(&vc->bht, io, block);
+
+ if (ret < 0) {
+ /* verity_dec_pending will handle the error case. */
+ io->error = ret;
+ break;
+ }
+ }
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+ struct verity_io *io = clone->bi_private;
+
+ if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+ error = -EIO;
+
+ if (unlikely(error)) {
+ DMERR("Error occurred: %d (%llu, %u)",
+ error, ULL(clone->bi_sector), clone->bi_size);
+ io->error = error;
+ }
+
+ /* Release the clone which just avoids the block layer from
+ * leaving offsets, etc in unexpected states.
+ */
+ bio_put(clone);
+
+ verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct verity_io *io)
+{
+ struct bio *clone;
+
+ /* Check if the read is already issued. */
+ if (io->flags & VERITY_IOFLAGS_CLONED)
+ return;
+
+ io->flags |= VERITY_IOFLAGS_CLONED;
+
+ /* Clone the bio. The block layer may modify the bvec array. */
+ clone = verity_bio_clone(io);
+ if (unlikely(!clone)) {
+ io->error = -ENOMEM;
+ return;
+ }
+
+ verity_inc_pending(io);
+
+ generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct verity_io *io = container_of(dwork, struct verity_io,
+ work);
+
+ /* Issue requests asynchronously. */
+ verity_inc_pending(io);
+ kverityd_src_io_read(io);
+ kverityd_io_bht_populate(io);
+ verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct verity_io *io)
+{
+ atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+ union map_info *map_context)
+{
+ struct verity_io *io;
+ struct verity_config *vc;
+ struct request_queue *r_queue;
+
+ if (unlikely(!ti)) {
+ DMERR("dm_target was NULL");
+ return -EIO;
+ }
+
+ vc = ti->private;
+ r_queue = bdev_get_queue(vc->dev->bdev);
+
+ if (bio_data_dir(bio) == WRITE) {
+ /* If we silently drop writes, then the VFS layer will cache
+ * the write and persist it in memory. While it doesn't change
+ * the underlying storage, it still may be contrary to the
+ * behavior expected by a verified, read-only device.
+ */
+ DMWARN_LIMIT("write request received. rejecting with -EIO.");
+ return -EIO;
+ } else {
+ /* Queue up the request to be verified */
+ io = verity_io_alloc(ti, bio);
+ if (!io) {
+ DMERR_LIMIT("Failed to allocate and init IO data");
+ return DM_MAPIO_REQUEUE;
+ }
+ INIT_DELAYED_WORK(&io->work, kverityd_io);
+ queue_delayed_work(kverityd_ioq, &io->work, 0);
+ }
+
+ return DM_MAPIO_SUBMITTED;
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/*
+ * Verity target parameters:
+ *
+ * <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+ *
+ * version: version of the hash tree on-disk format
+ * dev: device to verify
+ * hash_dev: device hashtree is stored on
+ * hash_start: start address of hashes
+ * block_size: size of a hash block
+ * alg: hash algorithm
+ * digest: toplevel hash of the tree
+ * salt: salt
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct verity_config *vc = NULL;
+ const char *dev, *hash_dev, *alg, *digest, *salt;
+ unsigned long hash_start, block_size, version;
+ sector_t blocks;
+ int ret;
+
+ if (argc != 8) {
+ ti->error = "Invalid argument count";
+ return -EINVAL;
+ }
+
+ if (strict_strtoul(argv[0], 10, &version) ||
+ (version != 0)) {
+ ti->error = "Invalid version";
+ return -EINVAL;
+ }
+ dev = argv[1];
+ hash_dev = argv[2];
+ if (strict_strtoul(argv[3], 10, &hash_start)) {
+ ti->error = "Invalid hash_start";
+ return -EINVAL;
+ }
+ if (strict_strtoul(argv[4], 10, &block_size) ||
+ (block_size > UINT_MAX)) {
+ ti->error = "Invalid block_size";
+ return -EINVAL;
+ }
+ alg = argv[5];
+ digest = argv[6];
+ salt = argv[7];
+
+ /* The device mapper device should be setup read-only */
+ if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+ ti->error = "Must be created readonly.";
+ return -EINVAL;
+ }
+
+ vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+ if (!vc) {
+ return -EINVAL;
+ }
+
+ /* Calculate the blocks from the given device size */
+ vc->size = ti->len;
+ blocks = to_bytes(vc->size) / block_size;
+ if (verity_tree_create(&vc->bht, blocks, block_size, alg)) {
+ DMERR("failed to create required bht");
+ goto bad_bht;
+ }
+ if (verity_tree_set_digest(&vc->bht, digest)) {
+ DMERR("digest error");
+ goto bad_digest;
+ }
+ verity_tree_set_salt(&vc->bht, salt);
+ vc->bht.read_cb = kverityd_bht_read_callback;
+
+ vc->start = 0;
+ /* We only ever grab the device in read-only mode. */
+ ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev);
+ if (ret) {
+ DMERR("Failed to acquire device '%s': %d", dev, ret);
+ ti->error = "Device lookup failed";
+ goto bad_verity_dev;
+ }
+
+ if ((to_bytes(vc->start) % block_size) ||
+ (to_bytes(vc->size) % block_size)) {
+ ti->error = "Device must be block_size divisble/aligned";
+ goto bad_hash_start;
+ }
+
+ vc->hash_start = (sector_t)hash_start;
+
+ /*
+ * Note, dev == hash_dev is okay as long as the size of
+ * ti->len passed to device mapper does not include
+ * the hashes.
+ */
+ if (dm_get_device(ti, hash_dev,
+ dm_table_get_mode(ti->table), &vc->hash_dev)) {
+ ti->error = "Hash device lookup failed";
+ goto bad_hash_dev;
+ }
+
+ if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+ CRYPTO_MAX_ALG_NAME) {
+ ti->error = "Hash algorithm name is too long";
+ goto bad_hash;
+ }
+
+ vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+ if (!vc->io_pool) {
+ ti->error = "Cannot allocate verity io mempool";
+ goto bad_slab_pool;
+ }
+
+ vc->bs = bioset_create(MIN_BIOS, 0);
+ if (!vc->bs) {
+ ti->error = "Cannot allocate verity bioset";
+ goto bad_bs;
+ }
+
+ ti->private = vc;
+
+ return 0;
+
+bad_bs:
+ mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_hash:
+ dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+ dm_put_device(ti, vc->dev);
+bad_bht:
+bad_digest:
+bad_verity_dev:
+ kfree(vc); /* hash is not secret so no need to zero */
+ return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+
+ bioset_free(vc->bs);
+ mempool_destroy(vc->io_pool);
+ verity_tree_destroy(&vc->bht);
+ dm_put_device(ti, vc->hash_dev);
+ dm_put_device(ti, vc->dev);
+ kfree(vc);
+}
+
+static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
+ unsigned long arg)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+ struct dm_dev *dev = vc->dev;
+ int r = 0;
+
+ /*
+ * Only pass ioctls through if the device sizes match exactly.
+ */
+ if (vc->start ||
+ ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+ r = scsi_verify_blk_ioctl(NULL, cmd);
+
+ return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+ char *result, unsigned int maxlen)
+{
+ struct verity_config *vc = (struct verity_config *) ti->private;
+ unsigned int sz = 0;
+ char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+ char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
+
+ verity_tree_digest(&vc->bht, digest);
+ verity_tree_salt(&vc->bht, salt);
+
+ switch (type) {
+ case STATUSTYPE_INFO:
+ result[0] = '';
+ break;
+ case STATUSTYPE_TABLE:
+ DMEMIT("%s %s %llu %llu %s %s %s",
+ vc->dev->name,
+ vc->hash_dev->name,
+ ULL(vc->hash_start),
+ ULL(vc->bht.block_size),
+ vc->hash_alg,
+ digest,
+ salt);
+ break;
+ }
+ return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+ struct bio_vec *biovec, int max_size)
+{
+ struct verity_config *vc = ti->private;
+ struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+ if (!q->merge_bvec_fn)
+ return max_size;
+
+ bvm->bi_bdev = vc->dev->bdev;
+ bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+ /* Optionally, this could just return 0 to stick to single pages. */
+ return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data)
+{
+ struct verity_config *vc = ti->private;
+
+ return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+ struct queue_limits *limits)
+{
+ struct verity_config *vc = ti->private;
+ unsigned int block_size = vc->bht.block_size;
+
+ limits->logical_block_size = block_size;
+ limits->physical_block_size = block_size;
+ blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+ .name = "verity",
+ .version = {0, 1, 0},
+ .module = THIS_MODULE,
+ .ctr = verity_ctr,
+ .dtr = verity_dtr,
+ .ioctl = verity_ioctl,
+ .map = verity_map,
+ .merge = verity_merge,
+ .status = verity_status,
+ .iterate_devices = verity_iterate_devices,
+ .io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init verity_init(void)
+{
+ int r = -ENOMEM;
+
+ _verity_io_pool = KMEM_CACHE(verity_io, 0);
+ if (!_verity_io_pool) {
+ DMERR("failed to allocate pool verity_io");
+ goto bad_io_pool;
+ }
+
+ kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+ if (!kverityd_ioq) {
+ DMERR("failed to create workqueue kverityd_ioq");
+ goto bad_io_queue;
+ }
+
+ kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+ if (!kveritydq) {
+ DMERR("failed to create workqueue kveritydq");
+ goto bad_verify_queue;
+ }
+
+ r = dm_register_target(&verity_target);
+ if (r < 0) {
+ DMERR("register failed %d", r);
+ goto register_failed;
+ }
+
+ DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+ verity_target.version[1], verity_target.version[2]);
+
+ return r;
+
+register_failed:
+ destroy_workqueue(kveritydq);
+bad_verify_queue:
+ destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+ kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+ return r;
+}
+
+static void __exit verity_exit(void)
+{
+ destroy_workqueue(kveritydq);
+ destroy_workqueue(kverityd_ioq);
+
+ dm_unregister_target(&verity_target);
+ kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(verity_init);
+module_exit(verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
--
1.7.7.3

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 07:52 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org