ACK/Cmnt: [SRU][Focal][PATCH 1/1] bpf: Fix toctou on read-only map's constant scalar tracking

Stefan Bader stefan.bader at canonical.com
Thu Aug 10 07:18:40 UTC 2023


On 07.08.23 15:21, Jacob Martin wrote:
> From: Daniel Borkmann <daniel at iogearbox.net>
> 
> Commit a23740ec43ba ("bpf: Track contents of read-only maps as scalars") is
> checking whether maps are read-only both from BPF program side and user space
> side, and then, given their content is constant, reading out their data via
> map->ops->map_direct_value_addr() which is then subsequently used as known
> scalar value for the register, that is, it is marked as __mark_reg_known()
> with the read value at verification time. Before a23740ec43ba, the register
> content was marked as an unknown scalar so the verifier could not make any
> assumptions about the map content.
> 
> The current implementation however is prone to a TOCTOU race, meaning, the
> value read as known scalar for the register is not guaranteed to be exactly
> the same at a later point when the program is executed, and as such, the
> prior made assumptions of the verifier with regards to the program will be
> invalid which can cause issues such as OOB access, etc.
> 
> While the BPF_F_RDONLY_PROG map flag is always fixed and required to be
> specified at map creation time, the map->frozen property is initially set to
> false for the map given the map value needs to be populated, e.g. for global
> data sections. Once complete, the loader "freezes" the map from user space
> such that no subsequent updates/deletes are possible anymore. For the rest
> of the lifetime of the map, this freeze one-time trigger cannot be undone
> anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_*
> cmd calls which would update/delete map entries will be rejected with -EPERM
> since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also
> means that pending update/delete map entries must still complete before this
> guarantee is given. This corner case is not an issue for loaders since they
> create and prepare such program private map in successive steps.
> 
> However, a malicious user is able to trigger this TOCTOU race in two different
> ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is
> used to expand the competition interval, so that map_update_elem() can modify
> the contents of the map after map_freeze() and bpf_prog_load() were executed.
> This works, because userfaultfd halts the parallel thread which triggered a
> map_update_elem() at the time where we copy key/value from the user buffer and
> this already passed the FMODE_CAN_WRITE capability test given at that time the
> map was not "frozen". Then, the main thread performs the map_freeze() and
> bpf_prog_load(), and once that had completed successfully, the other thread
> is woken up to complete the pending map_update_elem() which then changes the
> map content. For ii) the idea of the batched update is similar, meaning, when
> there are a large number of updates to be processed, it can increase the
> competition interval between the two. It is therefore possible in practice to
> modify the contents of the map after executing map_freeze() and bpf_prog_load().
> 
> One way to fix both i) and ii) at the same time is to expand the use of the
> map's map->writecnt. The latter was introduced in fc9702273e2e ("bpf: Add mmap()
> support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2e2 ("bpf:
> Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with
> the rationale to make a writable mmap()'ing of a map mutually exclusive with
> read-only freezing. The counter indicates writable mmap() mappings and then
> prevents/fails the freeze operation. Its semantics can be expanded beyond
> just mmap() by generally indicating ongoing write phases. This would essentially
> span any parallel regular and batched flavor of update/delete operation and
> then also have map_freeze() fail with -EBUSY. For the check_mem_access() in
> the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all
> last pending writes have completed via bpf_map_write_active() test. Once the
> map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0
> only then we are really guaranteed to use the map's data as known constants.
> For map->frozen being set and pending writes in process of still being completed
> we fall back to marking that register as unknown scalar so we don't end up
> making assumptions about it. With this, both TOCTOU reproducers from i) and
> ii) are fixed.
> 
> Note that the map->writecnt has been converted into a atomic64 in the fix in
> order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating
> map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning
> the freeze_mutex over entire map update/delete operations in syscall side
> would not be possible due to then causing everything to be serialized.
> Similarly, something like synchronize_rcu() after setting map->frozen to wait
> for update/deletes to complete is not possible either since it would also
> have to span the user copy which can sleep. On the libbpf side, this won't
> break d66562fba1ce ("libbpf: Add BPF object skeleton support") as the
> anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed
> mmap()-ed memory where for .rodata it's non-writable.
> 
> Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars")
> Reported-by: w1tcher.bupt at gmail.com
> Signed-off-by: Daniel Borkmann <daniel at iogearbox.net>
> Acked-by: Andrii Nakryiko <andrii at kernel.org>
> Signed-off-by: Alexei Starovoitov <ast at kernel.org>

CVE-2021-4001
> (backported from commit 353050be4c19e102178ccc05988101887c25ae53)
> [jacobmartin: bpf map mmap and batch support are not present, so omit fixes for them]
> Signed-off-by: Jacob Martin <jacob.martin at canonical.com>
Acked-by: Stefan Bader <stefan.bader at canonical.com>
> ---

Just as a personal note that I find things more readable if the new 
s-o-b block is detached and follows a <reason>, <origin>, <s-o-b> order.

-Stefan

>   include/linux/bpf.h   |  2 ++
>   kernel/bpf/syscall.c  | 25 +++++++++++++++++++++++++
>   kernel/bpf/verifier.c | 18 +++++++++++++++++-
>   3 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5705cda3c4c4..e6ed8cbda837 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -103,6 +103,7 @@ struct bpf_map {
>   	atomic_t usercnt;
>   	struct work_struct work;
>   	char name[BPF_OBJ_NAME_LEN];
> +	atomic64_t writecnt;
>   };
>   
>   static inline bool map_value_has_spin_lock(const struct bpf_map *map)
> @@ -663,6 +664,7 @@ void bpf_map_charge_move(struct bpf_map_memory *dst,
>   			 struct bpf_map_memory *src);
>   void *bpf_map_area_alloc(u64 size, int numa_node);
>   void bpf_map_area_free(void *base);
> +bool bpf_map_write_active(const struct bpf_map *map);
>   void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
>   
>   extern int sysctl_unprivileged_bpf_disabled;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index de788761b708..56eee42da0d0 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -127,6 +127,21 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
>   	return map;
>   }
>   
> +static void bpf_map_write_active_inc(struct bpf_map *map)
> +{
> +	atomic64_inc(&map->writecnt);
> +}
> +
> +static void bpf_map_write_active_dec(struct bpf_map *map)
> +{
> +	atomic64_dec(&map->writecnt);
> +}
> +
> +bool bpf_map_write_active(const struct bpf_map *map)
> +{
> +	return atomic64_read(&map->writecnt) != 0;
> +}
> +
>   void *bpf_map_area_alloc(u64 size, int numa_node)
>   {
>   	/* We really just want to fail instead of triggering OOM killer
> @@ -890,6 +905,7 @@ static int map_update_elem(union bpf_attr *attr)
>   	map = __bpf_map_get(f);
>   	if (IS_ERR(map))
>   		return PTR_ERR(map);
> +	bpf_map_write_active_inc(map);
>   	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
>   		err = -EPERM;
>   		goto err_put;
> @@ -979,6 +995,7 @@ static int map_update_elem(union bpf_attr *attr)
>   free_key:
>   	kfree(key);
>   err_put:
> +	bpf_map_write_active_dec(map);
>   	fdput(f);
>   	return err;
>   }
> @@ -1001,6 +1018,7 @@ static int map_delete_elem(union bpf_attr *attr)
>   	map = __bpf_map_get(f);
>   	if (IS_ERR(map))
>   		return PTR_ERR(map);
> +	bpf_map_write_active_inc(map);
>   	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
>   		err = -EPERM;
>   		goto err_put;
> @@ -1028,6 +1046,7 @@ static int map_delete_elem(union bpf_attr *attr)
>   out:
>   	kfree(key);
>   err_put:
> +	bpf_map_write_active_dec(map);
>   	fdput(f);
>   	return err;
>   }
> @@ -1119,6 +1138,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>   	map = __bpf_map_get(f);
>   	if (IS_ERR(map))
>   		return PTR_ERR(map);
> +	bpf_map_write_active_inc(map);
>   	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) ||
>   	    !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
>   		err = -EPERM;
> @@ -1160,6 +1180,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>   free_key:
>   	kfree(key);
>   err_put:
> +	bpf_map_write_active_dec(map);
>   	fdput(f);
>   	return err;
>   }
> @@ -1179,6 +1200,10 @@ static int map_freeze(const union bpf_attr *attr)
>   	map = __bpf_map_get(f);
>   	if (IS_ERR(map))
>   		return PTR_ERR(map);
> +	if (bpf_map_write_active(map)) {
> +		err = -EBUSY;
> +		goto err_put;
> +	}
>   	if (READ_ONCE(map->frozen)) {
>   		err = -EBUSY;
>   		goto err_put;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 530664693ac4..cda7907d08d9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2773,7 +2773,23 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
>   
>   static bool bpf_map_is_rdonly(const struct bpf_map *map)
>   {
> -	return (map->map_flags & BPF_F_RDONLY_PROG) && map->frozen;
> +	/* A map is considered read-only if the following condition are true:
> +	 *
> +	 * 1) BPF program side cannot change any of the map content. The
> +	 *    BPF_F_RDONLY_PROG flag is throughout the lifetime of a map
> +	 *    and was set at map creation time.
> +	 * 2) The map value(s) have been initialized from user space by a
> +	 *    loader and then "frozen", such that no new map update/delete
> +	 *    operations from syscall side are possible for the rest of
> +	 *    the map's lifetime from that point onwards.
> +	 *    the map's lifetime from that point onwards.
> +	 * 3) Any parallel/pending map update/delete operations from syscall
> +	 *    side have been completed. Only after that point, it's safe to
> +	 *    assume that map value(s) are immutable.
> +	 */
> +	return (map->map_flags & BPF_F_RDONLY_PROG) &&
> +		READ_ONCE(map->frozen) &&
> +		!bpf_map_write_active(map);
>   }
>   
>   static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 44613 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20230810/50a251f8/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20230810/50a251f8/attachment-0001.sig>


More information about the kernel-team mailing list