[PATCH][SRU][BIONIC] UBUNTU: SAUCE: (noup) Update zfs to 0.7.5-1ubuntu15 (LP: #1764690)

Colin Ian King colin.king at canonical.com
Sun Apr 22 10:48:28 UTC 2018


On 17/04/18 12:04, Colin King wrote:
> From: Colin Ian King <colin.king at canonical.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1764690
> 
> This sync's SRU fixes in ZFS 0.7.5-1ubuntu15 to the kernel ZFS driver.
> Fixes zfsonlinux issues fix in upstream ZFS repository:
> 
>  - OpenZFS 8373 - TXG_WAIT in ZIL commit path
>    Closes zfsonlinux #6403
>  - zfs promote|rename .../%recv should be an error
>    Closes zfsonlinux #4843, #6339
>  - Fix parsable 'zfs get' for compressratios
>    Closes zfsonlinux #6436, #6449
>  - Fix zpool events scripted mode tab separator
>    Closes zfsonlinux #6444, #6445
>  - zv_suspend_lock in zvol_open()/zvol_release()
>    Closes zfsonlinux #6342
>  - Allow longer SPA names in stats, allows bigger pool names
>    Closes zfsonlinux #6481
>  - vdev_mirror: load balancing fixes
>    Closes zfsonlinux #6461
>  - Fix zfs_ioc_pool_sync should not use fnvlist
>    Closes zfsonlinux #6529
>  - OpenZFS 8375 - Kernel memory leak in nvpair code
>    Closes zfsonlinux #6578
>  - OpenZFS 7261 - nvlist code should enforce name length limit
>    Closes zfsonlinux #6579
>  - OpenZFS 5778 - nvpair_type_is_array() does not recognize
>    DATA_TYPE_INT8_ARRAY
>    Closes zfsonlinux #6580
>  - dmu_objset: release bonus buffer in failure path
>    Closes zfsonlinux #6575
>  - Fix false config_cache_write events
>    Closes zfsonlinux #6617
>  - Fix printk() calls missing log level
>    Closes zfsonlinux #6672
>  - Fix abdstats kstat on 32-bit systems
>    Closes zfsonlinux #6721
>  - Relax ASSERT for #6526
>    Closes zfsonlinux #6526
>  - Fix coverity defects: 147480, 147584 (Logically dead code)
>    Closes zfsonlinux #6745
>  - Fix coverity defects: CID 161388 (Resource Leak)
>    Closes zfsonlinux #6755
>  - Use ashift=12 by default on SSDSC2BW48 disks
>    Closes zfsonlinux #6774
>  - OpenZFS 8558, 8602 - lwp_create() returns EAGAIN
>    Closes zfsonlinux #6779
>  - ZFS send fails to dump objects larger than 128PiB
>    Closes zfsonlinux #6760
>  - Sort output of tunables in arc_summary.py
>    Closes zfsonlinux #6828
> - Fix data on evict_skips in arc_summary.py
>    Closes zfsonlinux #6882, #6883
>  - Fix segfault in zpool iostat when adding VDEVs
>    Closes zfsonlinux #6748, #6872
>  - ZTS: Fix create-o_ashift test case
>    Closes zfsonlinux #6924, #6877
>  - Handle invalid options in arc_summary
>    Closes zfsonlinux #6983
>  - Call commit callbacks from the tail of the list
>    Closes zfsonlinux #6986
>  - Fix 'zpool add' handling of nested interior VDEVs
>    Closes zfsonlinux #6678, #6996
>  - Fix -fsanitize=address memory leak
>    kmem_alloc(0, ...) in userspace returns a leakable pointer.
>    Closes zfsonlinux #6941
>  - Revert raidz_map and _col structure types
>    Closes zfsonlinux #6981, #7023
>  - Use zap_count instead of cached z_size for unlink
>    Closes zfsonlinux #7019
>  - OpenZFS 8897 - zpool online -e fails assertion when run on non-leaf
>    vdevs
>    Closes zfsonlinux #7030
>  - OpenZFS 8898 - creating fs with checksum=skein on the boot pools
>    fails ungracefully
>    Closes zfsonlinux #7031
>  - Emit an error message before MMP suspends pool
>    Closes zfsonlinux #7048
>  - OpenZFS 8641 - "zpool clear" and "zinject" don't work on "spare"
>    or "replacing" vdevs
>    Closes zfsonlinux #7060
>  - OpenZFS 8835 - Speculative prefetch in ZFS not working for
>    misaligned reads
>    Closes zfsonlinux #7062
>  - OpenZFS 8972 - zfs holds: In scripted mode, do not pad columns with
>    spaces
>    Closes zfsonlinux #7063
>  - Revert "Remove wrong ASSERT in annotate_ecksum"
>    Closes zfsonlinux #7079
>  - OpenZFS 8731 - ASSERT3U(nui64s, <=, UINT16_MAX) fails for large
>    blocks
>    Closes zfsonlinux #7079
>  - Prevent zdb(8) from occasionally hanging on I/O
>    Closes zfsonlinux #6999
>  - Fix 'zfs receive -o' when used with '-e|-d'
>    Closes zfsonlinux #7088
>  - Change movaps to movups in AES-NI code
>    Closes zfsonlinux #7065, #7108
>  - tx_waited -> tx_dirty_delayed in trace_dmu.h
>    Closes zfsonlinux #7096
>  - OpenZFS 8966 - Source file zfs_acl.c, function
>    Closes zfsonlinux #7141
>  - Fix zdb -c traverse stop on damaged objset root
>    Closes zfsonlinux #7099
>  - Fix zle_decompress out of bound access
>    Closes zfsonlinux #7099
>  - Fix racy assignment of zcb.zcb_haderrors
>    Closes zfsonlinux #7099
>  - Fix zdb -R decompression
>    Closes zfsonlinux #7099, #4984
>  - Fix zdb -E segfault
>    Closes zfsonlinux #7099
>  - Fix zdb -ed on objset for exported pool
>    Closes zfsonlinux #7099, #6464
> 
> Signed-off-by: Colin Ian King <colin.king at canonical.com>
> ---
>  zfs/META                                    |  2 +-
>  zfs/include/sys/dmu.h                       |  5 ++
>  zfs/include/sys/dmu_tx.h                    |  4 --
>  zfs/include/sys/dsl_pool.h                  |  1 +
>  zfs/include/sys/trace_dmu.h                 | 11 ++--
>  zfs/include/sys/vdev.h                      |  3 +-
>  zfs/include/sys/vdev_impl.h                 |  1 -
>  zfs/include/sys/vdev_raidz_impl.h           | 34 +++++------
>  zfs/include/sys/zil_impl.h                  |  1 -
>  zfs/module/icp/asm-x86_64/aes/aes_intel.S   | 94 ++++++++++++++---------------
>  zfs/module/icp/asm-x86_64/modes/gcm_intel.S |  2 +-
>  zfs/module/icp/spi/kcf_spi.c                | 11 ++--
>  zfs/module/nvpair/nvpair.c                  |  9 ++-
>  zfs/module/zfs/abd.c                        |  4 +-
>  zfs/module/zfs/bpobj.c                      |  4 +-
>  zfs/module/zfs/dmu.c                        |  2 +-
>  zfs/module/zfs/dmu_objset.c                 |  1 +
>  zfs/module/zfs/dmu_send.c                   | 33 +++++-----
>  zfs/module/zfs/dmu_traverse.c               | 17 +++++-
>  zfs/module/zfs/dmu_tx.c                     |  2 +-
>  zfs/module/zfs/dmu_zfetch.c                 | 24 ++++++--
>  zfs/module/zfs/dsl_pool.c                   | 50 +++++++++++++++
>  zfs/module/zfs/metaslab.c                   |  3 +-
>  zfs/module/zfs/mmp.c                        |  5 ++
>  zfs/module/zfs/spa.c                        |  6 +-
>  zfs/module/zfs/spa_config.c                 |  5 ++
>  zfs/module/zfs/spa_stats.c                  | 25 +++++---
>  zfs/module/zfs/vdev_disk.c                  |  2 +-
>  zfs/module/zfs/vdev_mirror.c                | 36 +++++------
>  zfs/module/zfs/vdev_queue.c                 | 21 +++----
>  zfs/module/zfs/zfs_acl.c                    |  2 +-
>  zfs/module/zfs/zfs_dir.c                    | 16 ++++-
>  zfs/module/zfs/zfs_fm.c                     | 14 ++---
>  zfs/module/zfs/zfs_ioctl.c                  | 24 ++++++--
>  zfs/module/zfs/zil.c                        | 34 ++++++++---
>  zfs/module/zfs/zle.c                        |  4 ++
>  zfs/module/zfs/zvol.c                       | 64 +++++++++++++-------
>  37 files changed, 367 insertions(+), 209 deletions(-)
> 
> diff --git a/zfs/META b/zfs/META
> index d624ae4..2110eef 100644
> --- a/zfs/META
> +++ b/zfs/META
> @@ -2,7 +2,7 @@ Meta:         1
>  Name:         zfs
>  Branch:       1.0
>  Version:      0.7.5
> -Release:      1ubuntu13
> +Release:      1ubuntu15
>  Release-Tags: relext
>  License:      CDDL
>  Author:       OpenZFS on Linux
> diff --git a/zfs/include/sys/dmu.h b/zfs/include/sys/dmu.h
> index d246152..bcdf7d6 100644
> --- a/zfs/include/sys/dmu.h
> +++ b/zfs/include/sys/dmu.h
> @@ -713,11 +713,16 @@ void dmu_tx_mark_netfree(dmu_tx_t *tx);
>   * to stable storage and will also be called if the dmu_tx is aborted.
>   * If there is any error which prevents the transaction from being committed to
>   * disk, the callback will be called with a value of error != 0.
> + *
> + * When multiple callbacks are registered to the transaction, the callbacks
> + * will be called in reverse order to let Lustre, the only user of commit
> + * callback currently, take the fast path of its commit callback handling.
>   */
>  typedef void dmu_tx_callback_func_t(void *dcb_data, int error);
>  
>  void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func,
>      void *dcb_data);
> +void dmu_tx_do_callbacks(list_t *cb_list, int error);
>  
>  /*
>   * Free up the data blocks for a defined range of a file.  If size is
> diff --git a/zfs/include/sys/dmu_tx.h b/zfs/include/sys/dmu_tx.h
> index f16e1e8..d82a793 100644
> --- a/zfs/include/sys/dmu_tx.h
> +++ b/zfs/include/sys/dmu_tx.h
> @@ -145,10 +145,6 @@ uint64_t dmu_tx_get_txg(dmu_tx_t *tx);
>  struct dsl_pool *dmu_tx_pool(dmu_tx_t *tx);
>  void dmu_tx_wait(dmu_tx_t *tx);
>  
> -void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func,
> -    void *dcb_data);
> -void dmu_tx_do_callbacks(list_t *cb_list, int error);
> -
>  /*
>   * These routines are defined in dmu_spa.h, and are called by the SPA.
>   */
> diff --git a/zfs/include/sys/dsl_pool.h b/zfs/include/sys/dsl_pool.h
> index d2dabda..7eb6cb0 100644
> --- a/zfs/include/sys/dsl_pool.h
> +++ b/zfs/include/sys/dsl_pool.h
> @@ -126,6 +126,7 @@ typedef struct dsl_pool {
>  	txg_list_t dp_dirty_dirs;
>  	txg_list_t dp_sync_tasks;
>  	taskq_t *dp_sync_taskq;
> +	taskq_t *dp_zil_clean_taskq;
>  
>  	/*
>  	 * Protects administrative changes (properties, namespace)
> diff --git a/zfs/include/sys/trace_dmu.h b/zfs/include/sys/trace_dmu.h
> index 5ae59e5..24e57f5 100644
> --- a/zfs/include/sys/trace_dmu.h
> +++ b/zfs/include/sys/trace_dmu.h
> @@ -50,7 +50,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
>  	    __field(uint64_t,			tx_lastsnap_txg)
>  	    __field(uint64_t,			tx_lasttried_txg)
>  	    __field(boolean_t,			tx_anyobj)
> -	    __field(boolean_t,			tx_waited)
> +	    __field(boolean_t,			tx_dirty_delayed)
>  	    __field(hrtime_t,			tx_start)
>  	    __field(boolean_t,			tx_wait_dirty)
>  	    __field(int,			tx_err)
> @@ -62,7 +62,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
>  	    __entry->tx_lastsnap_txg		= tx->tx_lastsnap_txg;
>  	    __entry->tx_lasttried_txg		= tx->tx_lasttried_txg;
>  	    __entry->tx_anyobj			= tx->tx_anyobj;
> -	    __entry->tx_waited			= tx->tx_waited;
> +	    __entry->tx_dirty_delayed		= tx->tx_dirty_delayed;
>  	    __entry->tx_start			= tx->tx_start;
>  	    __entry->tx_wait_dirty		= tx->tx_wait_dirty;
>  	    __entry->tx_err			= tx->tx_err;
> @@ -70,11 +70,12 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
>  	    __entry->min_tx_time		= min_tx_time;
>  	),
>  	TP_printk("tx { txg %llu lastsnap_txg %llu tx_lasttried_txg %llu "
> -	    "anyobj %d waited %d start %llu wait_dirty %d err %i "
> +	    "anyobj %d dirty_delayed %d start %llu wait_dirty %d err %i "
>  	    "} dirty %llu min_tx_time %llu",
>  	    __entry->tx_txg, __entry->tx_lastsnap_txg,
> -	    __entry->tx_lasttried_txg, __entry->tx_anyobj, __entry->tx_waited,
> -	    __entry->tx_start, __entry->tx_wait_dirty, __entry->tx_err,
> +	    __entry->tx_lasttried_txg, __entry->tx_anyobj,
> +	    __entry->tx_dirty_delayed, __entry->tx_start,
> +	    __entry->tx_wait_dirty, __entry->tx_err,
>  	    __entry->dirty, __entry->min_tx_time)
>  );
>  /* END CSTYLED */
> diff --git a/zfs/include/sys/vdev.h b/zfs/include/sys/vdev.h
> index 7157ef4..473d269 100644
> --- a/zfs/include/sys/vdev.h
> +++ b/zfs/include/sys/vdev.h
> @@ -125,8 +125,7 @@ extern zio_t *vdev_queue_io(zio_t *zio);
>  extern void vdev_queue_io_done(zio_t *zio);
>  
>  extern int vdev_queue_length(vdev_t *vd);
> -extern uint64_t vdev_queue_lastoffset(vdev_t *vd);
> -extern void vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio);
> +extern uint64_t vdev_queue_last_offset(vdev_t *vd);
>  
>  extern void vdev_config_dirty(vdev_t *vd);
>  extern void vdev_config_clean(vdev_t *vd);
> diff --git a/zfs/include/sys/vdev_impl.h b/zfs/include/sys/vdev_impl.h
> index 7c5e54b..4c2e3cd 100644
> --- a/zfs/include/sys/vdev_impl.h
> +++ b/zfs/include/sys/vdev_impl.h
> @@ -127,7 +127,6 @@ struct vdev_queue {
>  	hrtime_t	vq_io_delta_ts;
>  	zio_t		vq_io_search; /* used as local for stack reduction */
>  	kmutex_t	vq_lock;
> -	uint64_t	vq_lastoffset;
>  };
>  
>  /*
> diff --git a/zfs/include/sys/vdev_raidz_impl.h b/zfs/include/sys/vdev_raidz_impl.h
> index 4bd15e3..0799ed1 100644
> --- a/zfs/include/sys/vdev_raidz_impl.h
> +++ b/zfs/include/sys/vdev_raidz_impl.h
> @@ -102,30 +102,30 @@ typedef struct raidz_impl_ops {
>  } raidz_impl_ops_t;
>  
>  typedef struct raidz_col {
> -	size_t rc_devidx;		/* child device index for I/O */
> -	size_t rc_offset;		/* device offset */
> -	size_t rc_size;			/* I/O size */
> +	uint64_t rc_devidx;		/* child device index for I/O */
> +	uint64_t rc_offset;		/* device offset */
> +	uint64_t rc_size;		/* I/O size */
>  	abd_t *rc_abd;			/* I/O data */
>  	void *rc_gdata;			/* used to store the "good" version */
>  	int rc_error;			/* I/O error for this device */
> -	unsigned int rc_tried;		/* Did we attempt this I/O column? */
> -	unsigned int rc_skipped;	/* Did we skip this I/O column? */
> +	uint8_t rc_tried;		/* Did we attempt this I/O column? */
> +	uint8_t rc_skipped;		/* Did we skip this I/O column? */
>  } raidz_col_t;
>  
>  typedef struct raidz_map {
> -	size_t rm_cols;			/* Regular column count */
> -	size_t rm_scols;		/* Count including skipped columns */
> -	size_t rm_bigcols;		/* Number of oversized columns */
> -	size_t rm_asize;		/* Actual total I/O size */
> -	size_t rm_missingdata;		/* Count of missing data devices */
> -	size_t rm_missingparity;	/* Count of missing parity devices */
> -	size_t rm_firstdatacol;		/* First data column/parity count */
> -	size_t rm_nskip;		/* Skipped sectors for padding */
> -	size_t rm_skipstart;		/* Column index of padding start */
> +	uint64_t rm_cols;		/* Regular column count */
> +	uint64_t rm_scols;		/* Count including skipped columns */
> +	uint64_t rm_bigcols;		/* Number of oversized columns */
> +	uint64_t rm_asize;		/* Actual total I/O size */
> +	uint64_t rm_missingdata;	/* Count of missing data devices */
> +	uint64_t rm_missingparity;	/* Count of missing parity devices */
> +	uint64_t rm_firstdatacol;	/* First data column/parity count */
> +	uint64_t rm_nskip;		/* Skipped sectors for padding */
> +	uint64_t rm_skipstart;		/* Column index of padding start */
>  	abd_t *rm_abd_copy;		/* rm_asize-buffer of copied data */
> -	size_t rm_reports;		/* # of referencing checksum reports */
> -	unsigned int rm_freed;		/* map no longer has referencing ZIO */
> -	unsigned int rm_ecksuminjected;	/* checksum error was injected */
> +	uintptr_t rm_reports;		/* # of referencing checksum reports */
> +	uint8_t	rm_freed;		/* map no longer has referencing ZIO */
> +	uint8_t	rm_ecksuminjected;	/* checksum error was injected */
>  	raidz_impl_ops_t *rm_ops;	/* RAIDZ math operations */
>  	raidz_col_t rm_col[1];		/* Flexible array of I/O columns */
>  } raidz_map_t;
> diff --git a/zfs/include/sys/zil_impl.h b/zfs/include/sys/zil_impl.h
> index 13ecca3..dd5304b 100644
> --- a/zfs/include/sys/zil_impl.h
> +++ b/zfs/include/sys/zil_impl.h
> @@ -124,7 +124,6 @@ struct zilog {
>  	list_t		zl_lwb_list;	/* in-flight log write list */
>  	kmutex_t	zl_vdev_lock;	/* protects zl_vdev_tree */
>  	avl_tree_t	zl_vdev_tree;	/* vdevs to flush in zil_commit() */
> -	taskq_t		*zl_clean_taskq; /* runs lwb and itx clean tasks */
>  	avl_tree_t	zl_bp_tree;	/* track bps during log parse */
>  	clock_t		zl_replay_time;	/* lbolt of when replay started */
>  	uint64_t	zl_replay_blks;	/* number of log blocks replayed */
> diff --git a/zfs/module/icp/asm-x86_64/aes/aes_intel.S b/zfs/module/icp/asm-x86_64/aes/aes_intel.S
> index ed0df75..a40e30f 100644
> --- a/zfs/module/icp/asm-x86_64/aes/aes_intel.S
> +++ b/zfs/module/icp/asm-x86_64/aes/aes_intel.S
> @@ -207,7 +207,7 @@ _key_expansion_256a_local:
>  	shufps	$0b10001100, %xmm0, %xmm4
>  	pxor	%xmm4, %xmm0
>  	pxor	%xmm1, %xmm0
> -	movaps	%xmm0, (%rcx)
> +	movups	%xmm0, (%rcx)
>  	add	$0x10, %rcx
>  	ret
>  	nop
> @@ -224,18 +224,18 @@ _key_expansion_192a_local:
>  	pxor	%xmm4, %xmm0
>  	pxor	%xmm1, %xmm0
>  
> -	movaps	%xmm2, %xmm5
> -	movaps	%xmm2, %xmm6
> +	movups	%xmm2, %xmm5
> +	movups	%xmm2, %xmm6
>  	pslldq	$4, %xmm5
>  	pshufd	$0b11111111, %xmm0, %xmm3
>  	pxor	%xmm3, %xmm2
>  	pxor	%xmm5, %xmm2
>  
> -	movaps	%xmm0, %xmm1
> +	movups	%xmm0, %xmm1
>  	shufps	$0b01000100, %xmm0, %xmm6
> -	movaps	%xmm6, (%rcx)
> +	movups	%xmm6, (%rcx)
>  	shufps	$0b01001110, %xmm2, %xmm1
> -	movaps	%xmm1, 0x10(%rcx)
> +	movups	%xmm1, 0x10(%rcx)
>  	add	$0x20, %rcx
>  	ret
>  SET_SIZE(_key_expansion_192a)
> @@ -250,13 +250,13 @@ _key_expansion_192b_local:
>  	pxor	%xmm4, %xmm0
>  	pxor	%xmm1, %xmm0
>  
> -	movaps	%xmm2, %xmm5
> +	movups	%xmm2, %xmm5
>  	pslldq	$4, %xmm5
>  	pshufd	$0b11111111, %xmm0, %xmm3
>  	pxor	%xmm3, %xmm2
>  	pxor	%xmm5, %xmm2
>  
> -	movaps	%xmm0, (%rcx)
> +	movups	%xmm0, (%rcx)
>  	add	$0x10, %rcx
>  	ret
>  SET_SIZE(_key_expansion_192b)
> @@ -270,7 +270,7 @@ _key_expansion_256b_local:
>  	shufps	$0b10001100, %xmm2, %xmm4
>  	pxor	%xmm4, %xmm2
>  	pxor	%xmm1, %xmm2
> -	movaps	%xmm2, (%rcx)
> +	movups	%xmm2, (%rcx)
>  	add	$0x10, %rcx
>  	ret
>  SET_SIZE(_key_expansion_256b)
> @@ -327,7 +327,7 @@ rijndael_key_setup_enc_intel_local:
>  	jz	.Lenc_key_invalid_param
>  
>  	movups	(%USERCIPHERKEY), %xmm0	// user key (first 16 bytes)
> -	movaps	%xmm0, (%AESKEY)
> +	movups	%xmm0, (%AESKEY)
>  	lea	0x10(%AESKEY), %rcx	// key addr
>  	pxor	%xmm4, %xmm4		// xmm4 is assumed 0 in _key_expansion_x
>  
> @@ -341,7 +341,7 @@ rijndael_key_setup_enc_intel_local:
>  #endif	/* OPENSSL_INTERFACE */
>  
>  	movups	0x10(%USERCIPHERKEY), %xmm2	// other user key (2nd 16 bytes)
> -	movaps	%xmm2, (%rcx)
> +	movups	%xmm2, (%rcx)
>  	add	$0x10, %rcx
>  
>  	aeskeygenassist $0x1, %xmm2, %xmm1	// expand the key
> @@ -525,10 +525,10 @@ FRAME_BEGIN
>  
>  .align 4
>  .Ldec_key_reorder_loop:
> -	movaps	(%AESKEY), %xmm0
> -	movaps	(%ROUNDS64), %xmm1
> -	movaps	%xmm0, (%ROUNDS64)
> -	movaps	%xmm1, (%AESKEY)
> +	movups	(%AESKEY), %xmm0
> +	movups	(%ROUNDS64), %xmm1
> +	movups	%xmm0, (%ROUNDS64)
> +	movups	%xmm1, (%AESKEY)
>  	lea	0x10(%AESKEY), %AESKEY
>  	lea	-0x10(%ROUNDS64), %ROUNDS64
>  	cmp	%AESKEY, %ROUNDS64
> @@ -536,11 +536,11 @@ FRAME_BEGIN
>  
>  .align 4
>  .Ldec_key_inv_loop:
> -	movaps	(%rcx), %xmm0
> +	movups	(%rcx), %xmm0
>  	// Convert an encryption round key to a form usable for decryption
>  	// with the "AES Inverse Mix Columns" instruction
>  	aesimc	%xmm0, %xmm1
> -	movaps	%xmm1, (%rcx)
> +	movups	%xmm1, (%rcx)
>  	lea	0x10(%rcx), %rcx
>  	cmp	%ENDAESKEY, %rcx
>  	jnz	.Ldec_key_inv_loop
> @@ -602,7 +602,7 @@ FRAME_BEGIN
>  ENTRY_NP(aes_encrypt_intel)
>  
>  	movups	(%INP), %STATE			// input
> -	movaps	(%KEYP), %KEY			// key
> +	movups	(%KEYP), %KEY			// key
>  #ifdef	OPENSSL_INTERFACE
>  	mov	240(%KEYP), %NROUNDS32		// round count
>  #else	/* OpenSolaris Interface */
> @@ -618,41 +618,41 @@ ENTRY_NP(aes_encrypt_intel)
>  
>  	// AES 256
>  	lea	0x20(%KEYP), %KEYP
> -	movaps	-0x60(%KEYP), %KEY
> +	movups	-0x60(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	-0x50(%KEYP), %KEY
> +	movups	-0x50(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
>  
>  .align 4
>  .Lenc192:
>  	// AES 192 and 256
> -	movaps	-0x40(%KEYP), %KEY
> +	movups	-0x40(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	-0x30(%KEYP), %KEY
> +	movups	-0x30(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
>  
>  .align 4
>  .Lenc128:
>  	// AES 128, 192, and 256
> -	movaps	-0x20(%KEYP), %KEY
> +	movups	-0x20(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	-0x10(%KEYP), %KEY
> +	movups	-0x10(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	(%KEYP), %KEY
> +	movups	(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	0x10(%KEYP), %KEY
> +	movups	0x10(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	0x20(%KEYP), %KEY
> +	movups	0x20(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	0x30(%KEYP), %KEY
> +	movups	0x30(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	0x40(%KEYP), %KEY
> +	movups	0x40(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	0x50(%KEYP), %KEY
> +	movups	0x50(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	0x60(%KEYP), %KEY
> +	movups	0x60(%KEYP), %KEY
>  	aesenc	%KEY, %STATE
> -	movaps	0x70(%KEYP), %KEY
> +	movups	0x70(%KEYP), %KEY
>  	aesenclast	 %KEY, %STATE		// last round
>  	movups	%STATE, (%OUTP)			// output
>  
> @@ -685,7 +685,7 @@ ENTRY_NP(aes_encrypt_intel)
>  ENTRY_NP(aes_decrypt_intel)
>  
>  	movups	(%INP), %STATE			// input
> -	movaps	(%KEYP), %KEY			// key
> +	movups	(%KEYP), %KEY			// key
>  #ifdef	OPENSSL_INTERFACE
>  	mov	240(%KEYP), %NROUNDS32		// round count
>  #else	/* OpenSolaris Interface */
> @@ -701,41 +701,41 @@ ENTRY_NP(aes_decrypt_intel)
>  
>  	// AES 256
>  	lea	0x20(%KEYP), %KEYP
> -	movaps	-0x60(%KEYP), %KEY
> +	movups	-0x60(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	-0x50(%KEYP), %KEY
> +	movups	-0x50(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
>  
>  .align 4
>  .Ldec192:
>  	// AES 192 and 256
> -	movaps	-0x40(%KEYP), %KEY
> +	movups	-0x40(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	-0x30(%KEYP), %KEY
> +	movups	-0x30(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
>  
>  .align 4
>  .Ldec128:
>  	// AES 128, 192, and 256
> -	movaps	-0x20(%KEYP), %KEY
> +	movups	-0x20(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	-0x10(%KEYP), %KEY
> +	movups	-0x10(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	(%KEYP), %KEY
> +	movups	(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	0x10(%KEYP), %KEY
> +	movups	0x10(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	0x20(%KEYP), %KEY
> +	movups	0x20(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	0x30(%KEYP), %KEY
> +	movups	0x30(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	0x40(%KEYP), %KEY
> +	movups	0x40(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	0x50(%KEYP), %KEY
> +	movups	0x50(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	0x60(%KEYP), %KEY
> +	movups	0x60(%KEYP), %KEY
>  	aesdec	%KEY, %STATE
> -	movaps	0x70(%KEYP), %KEY
> +	movups	0x70(%KEYP), %KEY
>  	aesdeclast	%KEY, %STATE		// last round
>  	movups	%STATE, (%OUTP)			// output
>  
> diff --git a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S
> index a43b5eb..3aec0ee 100644
> --- a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S
> +++ b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S
> @@ -150,7 +150,7 @@ ENTRY_NP(gcm_mul_pclmulqdq)
>  	// Byte swap 16-byte input
>  	//
>  	lea	.Lbyte_swap16_mask(%rip), %rax
> -	movaps	(%rax), %xmm10
> +	movups	(%rax), %xmm10
>  	pshufb	%xmm10, %xmm0
>  	pshufb	%xmm10, %xmm1
>  
> diff --git a/zfs/module/icp/spi/kcf_spi.c b/zfs/module/icp/spi/kcf_spi.c
> index c2c2b54..0a6e38d 100644
> --- a/zfs/module/icp/spi/kcf_spi.c
> +++ b/zfs/module/icp/spi/kcf_spi.c
> @@ -111,7 +111,7 @@ int
>  crypto_register_provider(crypto_provider_info_t *info,
>      crypto_kcf_provider_handle_t *handle)
>  {
> -	char ks_name[KSTAT_STRLEN];
> +	char *ks_name;
>  
>  	kcf_provider_desc_t *prov_desc = NULL;
>  	int ret = CRYPTO_ARGUMENTS_BAD;
> @@ -238,12 +238,12 @@ crypto_register_provider(crypto_provider_info_t *info,
>  		 * This kstat is deleted, when the provider unregisters.
>  		 */
>  		if (prov_desc->pd_prov_type == CRYPTO_SW_PROVIDER) {
> -			(void) snprintf(ks_name, KSTAT_STRLEN, "%s_%s",
> +			ks_name = kmem_asprintf("%s_%s",
>  			    "NONAME", "provider_stats");
>  		} else {
> -			(void) snprintf(ks_name, KSTAT_STRLEN, "%s_%d_%u_%s",
> -			    "NONAME", 0,
> -			    prov_desc->pd_prov_id, "provider_stats");
> +			ks_name = kmem_asprintf("%s_%d_%u_%s",
> +			    "NONAME", 0, prov_desc->pd_prov_id,
> +			    "provider_stats");
>  		}
>  
>  		prov_desc->pd_kstat = kstat_create("kcf", 0, ks_name, "crypto",
> @@ -261,6 +261,7 @@ crypto_register_provider(crypto_provider_info_t *info,
>  			prov_desc->pd_kstat->ks_update = kcf_prov_kstat_update;
>  			kstat_install(prov_desc->pd_kstat);
>  		}
> +		strfree(ks_name);
>  	}
>  
>  	if (prov_desc->pd_prov_type == CRYPTO_HW_PROVIDER)
> diff --git a/zfs/module/nvpair/nvpair.c b/zfs/module/nvpair/nvpair.c
> index 249b7c9..abed33e 100644
> --- a/zfs/module/nvpair/nvpair.c
> +++ b/zfs/module/nvpair/nvpair.c
> @@ -21,7 +21,7 @@
>  
>  /*
>   * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
> - * Copyright (c) 2015, 2016 by Delphix. All rights reserved.
> + * Copyright (c) 2015, 2017 by Delphix. All rights reserved.
>   */
>  
>  #include <sys/stropts.h>
> @@ -916,6 +916,8 @@ nvlist_add_common(nvlist_t *nvl, const char *name,
>  
>  	/* calculate sizes of the nvpair elements and the nvpair itself */
>  	name_sz = strlen(name) + 1;
> +	if (name_sz >= 1ULL << (sizeof (nvp->nvp_name_sz) * NBBY - 1))
> +		return (EINVAL);
>  
>  	nvp_sz = NVP_SIZE_CALC(name_sz, value_sz);
>  
> @@ -1242,6 +1244,7 @@ nvpair_type_is_array(nvpair_t *nvp)
>  	data_type_t type = NVP_TYPE(nvp);
>  
>  	if ((type == DATA_TYPE_BYTE_ARRAY) ||
> +	    (type == DATA_TYPE_INT8_ARRAY) ||
>  	    (type == DATA_TYPE_UINT8_ARRAY) ||
>  	    (type == DATA_TYPE_INT16_ARRAY) ||
>  	    (type == DATA_TYPE_UINT16_ARRAY) ||
> @@ -2200,8 +2203,10 @@ nvs_embedded(nvstream_t *nvs, nvlist_t *embedded)
>  
>  		nvlist_init(embedded, embedded->nvl_nvflag, priv);
>  
> -		if (nvs->nvs_recursion >= nvpair_max_recursion)
> +		if (nvs->nvs_recursion >= nvpair_max_recursion) {
> +			nvlist_free(embedded);
>  			return (EINVAL);
> +		}
>  		nvs->nvs_recursion++;
>  		if ((err = nvs_operation(nvs, embedded, NULL)) != 0)
>  			nvlist_free(embedded);
> diff --git a/zfs/module/zfs/abd.c b/zfs/module/zfs/abd.c
> index 765ac7f..3c7893d 100644
> --- a/zfs/module/zfs/abd.c
> +++ b/zfs/module/zfs/abd.c
> @@ -571,7 +571,7 @@ static inline void
>  abd_free_struct(abd_t *abd)
>  {
>  	kmem_cache_free(abd_cache, abd);
> -	ABDSTAT_INCR(abdstat_struct_size, -sizeof (abd_t));
> +	ABDSTAT_INCR(abdstat_struct_size, -(int)sizeof (abd_t));
>  }
>  
>  /*
> @@ -618,7 +618,7 @@ abd_free_scatter(abd_t *abd)
>  	ABDSTAT_BUMPDOWN(abdstat_scatter_cnt);
>  	ABDSTAT_INCR(abdstat_scatter_data_size, -(int)abd->abd_size);
>  	ABDSTAT_INCR(abdstat_scatter_chunk_waste,
> -	    abd->abd_size - P2ROUNDUP(abd->abd_size, PAGESIZE));
> +	    (int)abd->abd_size - (int)P2ROUNDUP(abd->abd_size, PAGESIZE));
>  
>  	abd_free_struct(abd);
>  }
> diff --git a/zfs/module/zfs/bpobj.c b/zfs/module/zfs/bpobj.c
> index 82ca94e..32459c9 100644
> --- a/zfs/module/zfs/bpobj.c
> +++ b/zfs/module/zfs/bpobj.c
> @@ -261,7 +261,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx,
>  	}
>  	if (free) {
>  		VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, bpo->bpo_object,
> -		    (i + 1) * sizeof (blkptr_t), -1ULL, tx));
> +		    (i + 1) * sizeof (blkptr_t), DMU_OBJECT_END, tx));
>  	}
>  	if (err || !bpo->bpo_havesubobj || bpo->bpo_phys->bpo_subobjs == 0)
>  		goto out;
> @@ -339,7 +339,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx,
>  	if (free) {
>  		VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os,
>  		    bpo->bpo_phys->bpo_subobjs,
> -		    (i + 1) * sizeof (uint64_t), -1ULL, tx));
> +		    (i + 1) * sizeof (uint64_t), DMU_OBJECT_END, tx));
>  	}
>  
>  out:
> diff --git a/zfs/module/zfs/dmu.c b/zfs/module/zfs/dmu.c
> index 6f09aa2..05c9fc3 100644
> --- a/zfs/module/zfs/dmu.c
> +++ b/zfs/module/zfs/dmu.c
> @@ -887,7 +887,7 @@ dmu_free_range(objset_t *os, uint64_t object, uint64_t offset,
>  	if (err)
>  		return (err);
>  	ASSERT(offset < UINT64_MAX);
> -	ASSERT(size == -1ULL || size <= UINT64_MAX - offset);
> +	ASSERT(size == DMU_OBJECT_END || size <= UINT64_MAX - offset);
>  	dnode_free_range(dn, offset, size, tx);
>  	dnode_rele(dn, FTAG);
>  	return (0);
> diff --git a/zfs/module/zfs/dmu_objset.c b/zfs/module/zfs/dmu_objset.c
> index 9a7a696..3425d54 100644
> --- a/zfs/module/zfs/dmu_objset.c
> +++ b/zfs/module/zfs/dmu_objset.c
> @@ -1853,6 +1853,7 @@ dmu_objset_space_upgrade(objset_t *os)
>  		dmu_tx_hold_bonus(tx, obj);
>  		objerr = dmu_tx_assign(tx, TXG_WAIT);
>  		if (objerr != 0) {
> +			dmu_buf_rele(db, FTAG);
>  			dmu_tx_abort(tx);
>  			continue;
>  		}
> diff --git a/zfs/module/zfs/dmu_send.c b/zfs/module/zfs/dmu_send.c
> index 344e420..2e3d706 100644
> --- a/zfs/module/zfs/dmu_send.c
> +++ b/zfs/module/zfs/dmu_send.c
> @@ -224,9 +224,6 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset,
>  	    (object == dsp->dsa_last_data_object &&
>  	    offset > dsp->dsa_last_data_offset));
>  
> -	if (length != -1ULL && offset + length < offset)
> -		length = -1ULL;
> -
>  	/*
>  	 * If there is a pending op, but it's not PENDING_FREE, push it out,
>  	 * since free block aggregation can only be done for blocks of the
> @@ -243,19 +240,22 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset,
>  
>  	if (dsp->dsa_pending_op == PENDING_FREE) {
>  		/*
> -		 * There should never be a PENDING_FREE if length is -1
> -		 * (because dump_dnode is the only place where this
> -		 * function is called with a -1, and only after flushing
> -		 * any pending record).
> +		 * There should never be a PENDING_FREE if length is
> +		 * DMU_OBJECT_END (because dump_dnode is the only place where
> +		 * this function is called with a DMU_OBJECT_END, and only after
> +		 * flushing any pending record).
>  		 */
> -		ASSERT(length != -1ULL);
> +		ASSERT(length != DMU_OBJECT_END);
>  		/*
>  		 * Check to see whether this free block can be aggregated
>  		 * with pending one.
>  		 */
>  		if (drrf->drr_object == object && drrf->drr_offset +
>  		    drrf->drr_length == offset) {
> -			drrf->drr_length += length;
> +			if (offset + length < offset)
> +				drrf->drr_length = DMU_OBJECT_END;
> +			else
> +				drrf->drr_length += length;
>  			return (0);
>  		} else {
>  			/* not a continuation.  Push out pending record */
> @@ -269,9 +269,12 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset,
>  	dsp->dsa_drr->drr_type = DRR_FREE;
>  	drrf->drr_object = object;
>  	drrf->drr_offset = offset;
> -	drrf->drr_length = length;
> +	if (offset + length < offset)
> +		drrf->drr_length = DMU_OBJECT_END;
> +	else
> +		drrf->drr_length = length;
>  	drrf->drr_toguid = dsp->dsa_toguid;
> -	if (length == -1ULL) {
> +	if (length == DMU_OBJECT_END) {
>  		if (dump_record(dsp, NULL, 0) != 0)
>  			return (SET_ERROR(EINTR));
>  	} else {
> @@ -530,7 +533,7 @@ dump_dnode(dmu_sendarg_t *dsp, uint64_t object, dnode_phys_t *dnp)
>  
>  	/* Free anything past the end of the file. */
>  	if (dump_free(dsp, object, (dnp->dn_maxblkid + 1) *
> -	    (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), -1ULL) != 0)
> +	    (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), DMU_OBJECT_END) != 0)
>  		return (SET_ERROR(EINTR));
>  	if (dsp->dsa_err != 0)
>  		return (SET_ERROR(EINTR));
> @@ -666,7 +669,9 @@ do_dump(dmu_sendarg_t *dsa, struct send_block_record *data)
>  	} else if (BP_IS_HOLE(bp)) {
>  		uint64_t span = BP_SPAN(dblkszsec, indblkshift, zb->zb_level);
>  		uint64_t offset = zb->zb_blkid * span;
> -		err = dump_free(dsa, zb->zb_object, offset, span);
> +		/* Don't dump free records for offsets > DMU_OBJECT_END */
> +		if (zb->zb_blkid == 0 || span <= DMU_OBJECT_END / zb->zb_blkid)
> +			err = dump_free(dsa, zb->zb_object, offset, span);
>  	} else if (zb->zb_level > 0 || type == DMU_OT_OBJSET) {
>  		return (0);
>  	} else if (type == DMU_OT_DNODE) {
> @@ -2498,7 +2503,7 @@ receive_free(struct receive_writer_arg *rwa, struct drr_free *drrf)
>  {
>  	int err;
>  
> -	if (drrf->drr_length != -1ULL &&
> +	if (drrf->drr_length != DMU_OBJECT_END &&
>  	    drrf->drr_offset + drrf->drr_length < drrf->drr_offset)
>  		return (SET_ERROR(EINVAL));
>  
> diff --git a/zfs/module/zfs/dmu_traverse.c b/zfs/module/zfs/dmu_traverse.c
> index c78228d..62f770e 100644
> --- a/zfs/module/zfs/dmu_traverse.c
> +++ b/zfs/module/zfs/dmu_traverse.c
> @@ -609,9 +609,20 @@ traverse_impl(spa_t *spa, dsl_dataset_t *ds, uint64_t objset, blkptr_t *rootbp,
>  		if (err != 0)
>  			return (err);
>  
> -		osp = buf->b_data;
> -		traverse_zil(td, &osp->os_zil_header);
> -		arc_buf_destroy(buf, &buf);
> +		if (err != 0) {
> +			/*
> +			 * If both TRAVERSE_HARD and TRAVERSE_PRE are set,
> +			 * continue to visitbp so that td_func can be called
> +			 * in pre stage, and err will reset to zero.
> +			 */
> +			if (!(td->td_flags & TRAVERSE_HARD) ||
> +			    !(td->td_flags & TRAVERSE_PRE))
> +				return (err);
> +		} else {
> +			osp = buf->b_data;
> +			traverse_zil(td, &osp->os_zil_header);
> +			arc_buf_destroy(buf, &buf);
> +		}
>  	}
>  
>  	if (!(flags & TRAVERSE_PREFETCH_DATA) ||
> diff --git a/zfs/module/zfs/dmu_tx.c b/zfs/module/zfs/dmu_tx.c
> index 097fa77..c3cc03a 100644
> --- a/zfs/module/zfs/dmu_tx.c
> +++ b/zfs/module/zfs/dmu_tx.c
> @@ -1200,7 +1200,7 @@ dmu_tx_do_callbacks(list_t *cb_list, int error)
>  {
>  	dmu_tx_callback_t *dcb;
>  
> -	while ((dcb = list_head(cb_list)) != NULL) {
> +	while ((dcb = list_tail(cb_list)) != NULL) {
>  		list_remove(cb_list, dcb);
>  		dcb->dcb_func(dcb->dcb_data, error);
>  		kmem_free(dcb, sizeof (dmu_tx_callback_t));
> diff --git a/zfs/module/zfs/dmu_zfetch.c b/zfs/module/zfs/dmu_zfetch.c
> index 1bf5c4e..e72e9ef 100644
> --- a/zfs/module/zfs/dmu_zfetch.c
> +++ b/zfs/module/zfs/dmu_zfetch.c
> @@ -228,19 +228,33 @@ dmu_zfetch(zfetch_t *zf, uint64_t blkid, uint64_t nblks, boolean_t fetch_data)
>  
>  	rw_enter(&zf->zf_rwlock, RW_READER);
>  
> +	/*
> +	 * Find matching prefetch stream.  Depending on whether the accesses
> +	 * are block-aligned, first block of the new access may either follow
> +	 * the last block of the previous access, or be equal to it.
> +	 */
>  	for (zs = list_head(&zf->zf_stream); zs != NULL;
>  	    zs = list_next(&zf->zf_stream, zs)) {
> -		if (blkid == zs->zs_blkid) {
> +		if (blkid == zs->zs_blkid || blkid + 1 == zs->zs_blkid) {
>  			mutex_enter(&zs->zs_lock);
>  			/*
>  			 * zs_blkid could have changed before we
>  			 * acquired zs_lock; re-check them here.
>  			 */
> -			if (blkid != zs->zs_blkid) {
> -				mutex_exit(&zs->zs_lock);
> -				continue;
> +			if (blkid == zs->zs_blkid) {
> +				break;
> +			} else if (blkid + 1 == zs->zs_blkid) {
> +				blkid++;
> +				nblks--;
> +				if (nblks == 0) {
> +					/* Already prefetched this before. */
> +					mutex_exit(&zs->zs_lock);
> +					rw_exit(&zf->zf_rwlock);
> +					return;
> +				}
> +				break;
>  			}
> -			break;
> +			mutex_exit(&zs->zs_lock);
>  		}
>  	}
>  
> diff --git a/zfs/module/zfs/dsl_pool.c b/zfs/module/zfs/dsl_pool.c
> index c167080..0320d0e 100644
> --- a/zfs/module/zfs/dsl_pool.c
> +++ b/zfs/module/zfs/dsl_pool.c
> @@ -135,6 +135,36 @@ unsigned long zfs_delay_scale = 1000 * 1000 * 1000 / 2000;
>   */
>  int zfs_sync_taskq_batch_pct = 75;
>  
> +/*
> + * These tunables determine the behavior of how zil_itxg_clean() is
> + * called via zil_clean() in the context of spa_sync(). When an itxg
> + * list needs to be cleaned, TQ_NOSLEEP will be used when dispatching.
> + * If the dispatch fails, the call to zil_itxg_clean() will occur
> + * synchronously in the context of spa_sync(), which can negatively
> + * impact the performance of spa_sync() (e.g. in the case of the itxg
> + * list having a large number of itxs that needs to be cleaned).
> + *
> + * Thus, these tunables can be used to manipulate the behavior of the
> + * taskq used by zil_clean(); they determine the number of taskq entries
> + * that are pre-populated when the taskq is first created (via the
> + * "zfs_zil_clean_taskq_minalloc" tunable) and the maximum number of
> + * taskq entries that are cached after an on-demand allocation (via the
> + * "zfs_zil_clean_taskq_maxalloc").
> + *
> + * The idea being, we want to try reasonably hard to ensure there will
> + * already be a taskq entry pre-allocated by the time that it is needed
> + * by zil_clean(). This way, we can avoid the possibility of an
> + * on-demand allocation of a new taskq entry from failing, which would
> + * result in zil_itxg_clean() being called synchronously from zil_clean()
> + * (which can adversely affect performance of spa_sync()).
> + *
> + * Additionally, the number of threads used by the taskq can be
> + * configured via the "zfs_zil_clean_taskq_nthr_pct" tunable.
> + */
> +int zfs_zil_clean_taskq_nthr_pct = 100;
> +int zfs_zil_clean_taskq_minalloc = 1024;
> +int zfs_zil_clean_taskq_maxalloc = 1024 * 1024;
> +
>  int
>  dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t **ddp)
>  {
> @@ -176,6 +206,12 @@ dsl_pool_open_impl(spa_t *spa, uint64_t txg)
>  	    zfs_sync_taskq_batch_pct, minclsyspri, 1, INT_MAX,
>  	    TASKQ_THREADS_CPU_PCT);
>  
> +	dp->dp_zil_clean_taskq = taskq_create("dp_zil_clean_taskq",
> +	    zfs_zil_clean_taskq_nthr_pct, minclsyspri,
> +	    zfs_zil_clean_taskq_minalloc,
> +	    zfs_zil_clean_taskq_maxalloc,
> +	    TASKQ_PREPOPULATE | TASKQ_THREADS_CPU_PCT);
> +
>  	mutex_init(&dp->dp_lock, NULL, MUTEX_DEFAULT, NULL);
>  	cv_init(&dp->dp_spaceavail_cv, NULL, CV_DEFAULT, NULL);
>  
> @@ -334,6 +370,7 @@ dsl_pool_close(dsl_pool_t *dp)
>  	txg_list_destroy(&dp->dp_sync_tasks);
>  	txg_list_destroy(&dp->dp_dirty_dirs);
>  
> +	taskq_destroy(dp->dp_zil_clean_taskq);
>  	taskq_destroy(dp->dp_sync_taskq);
>  
>  	/*
> @@ -1142,5 +1179,18 @@ MODULE_PARM_DESC(zfs_delay_scale, "how quickly delay approaches infinity");
>  module_param(zfs_sync_taskq_batch_pct, int, 0644);
>  MODULE_PARM_DESC(zfs_sync_taskq_batch_pct,
>  	"max percent of CPUs that are used to sync dirty data");
> +
> +module_param(zfs_zil_clean_taskq_nthr_pct, int, 0644);
> +MODULE_PARM_DESC(zfs_zil_clean_taskq_nthr_pct,
> +	"max percent of CPUs that are used per dp_sync_taskq");
> +
> +module_param(zfs_zil_clean_taskq_minalloc, int, 0644);
> +MODULE_PARM_DESC(zfs_zil_clean_taskq_minalloc,
> +	"number of taskq entries that are pre-populated");
> +
> +module_param(zfs_zil_clean_taskq_maxalloc, int, 0644);
> +MODULE_PARM_DESC(zfs_zil_clean_taskq_maxalloc,
> +	"max number of taskq entries that are cached");
> +
>  /* END CSTYLED */
>  #endif
> diff --git a/zfs/module/zfs/metaslab.c b/zfs/module/zfs/metaslab.c
> index 5e413c0..01e5234 100644
> --- a/zfs/module/zfs/metaslab.c
> +++ b/zfs/module/zfs/metaslab.c
> @@ -1937,7 +1937,8 @@ metaslab_passivate(metaslab_t *msp, uint64_t weight)
>  	 * this metaslab again.  In that case, it had better be empty,
>  	 * or we would be leaving space on the table.
>  	 */
> -	ASSERT(size >= SPA_MINBLOCKSIZE ||
> +	ASSERT(!WEIGHT_IS_SPACEBASED(msp->ms_weight) ||
> +	    size >= SPA_MINBLOCKSIZE ||
>  	    range_tree_space(msp->ms_tree) == 0);
>  	ASSERT0(weight & METASLAB_ACTIVE_MASK);
>  
> diff --git a/zfs/module/zfs/mmp.c b/zfs/module/zfs/mmp.c
> index 6f2aa3f..e91ae62 100644
> --- a/zfs/module/zfs/mmp.c
> +++ b/zfs/module/zfs/mmp.c
> @@ -26,6 +26,7 @@
>  #include <sys/mmp.h>
>  #include <sys/spa.h>
>  #include <sys/spa_impl.h>
> +#include <sys/time.h>
>  #include <sys/vdev.h>
>  #include <sys/vdev_impl.h>
>  #include <sys/zfs_context.h>
> @@ -428,6 +429,10 @@ mmp_thread(spa_t *spa)
>  		 */
>  		if (!suspended && mmp_fail_intervals && multihost &&
>  		    (start - mmp->mmp_last_write) > max_fail_ns) {
> +			cmn_err(CE_WARN, "MMP writes to pool '%s' have not "
> +			    "succeeded in over %llus; suspending pool",
> +			    spa_name(spa),
> +			    NSEC2SEC(start - mmp->mmp_last_write));
>  			zio_suspend(spa, NULL);
>  		}
>  
> diff --git a/zfs/module/zfs/spa.c b/zfs/module/zfs/spa.c
> index a7a2f62..00587d8 100644
> --- a/zfs/module/zfs/spa.c
> +++ b/zfs/module/zfs/spa.c
> @@ -1561,7 +1561,7 @@ spa_load_spares(spa_t *spa)
>  static void
>  spa_load_l2cache(spa_t *spa)
>  {
> -	nvlist_t **l2cache;
> +	nvlist_t **l2cache = NULL;
>  	uint_t nl2cache;
>  	int i, j, oldnvdevs;
>  	uint64_t guid;
> @@ -1645,7 +1645,9 @@ spa_load_l2cache(spa_t *spa)
>  	VERIFY(nvlist_remove(sav->sav_config, ZPOOL_CONFIG_L2CACHE,
>  	    DATA_TYPE_NVLIST_ARRAY) == 0);
>  
> -	l2cache = kmem_alloc(sav->sav_count * sizeof (void *), KM_SLEEP);
> +	if (sav->sav_count > 0)
> +		l2cache = kmem_alloc(sav->sav_count * sizeof (void *),
> +		    KM_SLEEP);
>  	for (i = 0; i < sav->sav_count; i++)
>  		l2cache[i] = vdev_config_generate(spa,
>  		    sav->sav_vdevs[i], B_TRUE, VDEV_CONFIG_L2CACHE);
> diff --git a/zfs/module/zfs/spa_config.c b/zfs/module/zfs/spa_config.c
> index 5b792b8..5bbfb4a 100644
> --- a/zfs/module/zfs/spa_config.c
> +++ b/zfs/module/zfs/spa_config.c
> @@ -162,6 +162,11 @@ spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl)
>  	 */
>  	if (nvl == NULL) {
>  		err = vn_remove(dp->scd_path, UIO_SYSSPACE, RMFILE);
> +		/*
> +		 * Don't report an error when the cache file is already removed
> +		 */
> +		if (err == ENOENT)
> +			err = 0;
>  		return (err);
>  	}
>  
> diff --git a/zfs/module/zfs/spa_stats.c b/zfs/module/zfs/spa_stats.c
> index 7ca3598..8c4dba2 100644
> --- a/zfs/module/zfs/spa_stats.c
> +++ b/zfs/module/zfs/spa_stats.c
> @@ -142,7 +142,7 @@ static void
>  spa_read_history_init(spa_t *spa)
>  {
>  	spa_stats_history_t *ssh = &spa->spa_stats.read_history;
> -	char name[KSTAT_STRLEN];
> +	char *name;
>  	kstat_t *ksp;
>  
>  	mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL);
> @@ -153,7 +153,7 @@ spa_read_history_init(spa_t *spa)
>  	ssh->size = 0;
>  	ssh->private = NULL;
>  
> -	(void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa));
> +	name = kmem_asprintf("zfs/%s", spa_name(spa));
>  
>  	ksp = kstat_create(name, 0, "reads", "misc",
>  	    KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL);
> @@ -168,6 +168,7 @@ spa_read_history_init(spa_t *spa)
>  		    spa_read_history_data, spa_read_history_addr);
>  		kstat_install(ksp);
>  	}
> +	strfree(name);
>  }
>  
>  static void
> @@ -365,7 +366,7 @@ static void
>  spa_txg_history_init(spa_t *spa)
>  {
>  	spa_stats_history_t *ssh = &spa->spa_stats.txg_history;
> -	char name[KSTAT_STRLEN];
> +	char *name;
>  	kstat_t *ksp;
>  
>  	mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL);
> @@ -376,7 +377,7 @@ spa_txg_history_init(spa_t *spa)
>  	ssh->size = 0;
>  	ssh->private = NULL;
>  
> -	(void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa));
> +	name = kmem_asprintf("zfs/%s", spa_name(spa));
>  
>  	ksp = kstat_create(name, 0, "txgs", "misc",
>  	    KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL);
> @@ -391,6 +392,7 @@ spa_txg_history_init(spa_t *spa)
>  		    spa_txg_history_data, spa_txg_history_addr);
>  		kstat_install(ksp);
>  	}
> +	strfree(name);
>  }
>  
>  static void
> @@ -598,7 +600,7 @@ static void
>  spa_tx_assign_init(spa_t *spa)
>  {
>  	spa_stats_history_t *ssh = &spa->spa_stats.tx_assign_histogram;
> -	char name[KSTAT_STRLEN];
> +	char *name;
>  	kstat_named_t *ks;
>  	kstat_t *ksp;
>  	int i;
> @@ -609,7 +611,7 @@ spa_tx_assign_init(spa_t *spa)
>  	ssh->size = ssh->count * sizeof (kstat_named_t);
>  	ssh->private = kmem_alloc(ssh->size, KM_SLEEP);
>  
> -	(void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa));
> +	name = kmem_asprintf("zfs/%s", spa_name(spa));
>  
>  	for (i = 0; i < ssh->count; i++) {
>  		ks = &((kstat_named_t *)ssh->private)[i];
> @@ -632,6 +634,7 @@ spa_tx_assign_init(spa_t *spa)
>  		ksp->ks_update = spa_tx_assign_update;
>  		kstat_install(ksp);
>  	}
> +	strfree(name);
>  }
>  
>  static void
> @@ -678,12 +681,12 @@ static void
>  spa_io_history_init(spa_t *spa)
>  {
>  	spa_stats_history_t *ssh = &spa->spa_stats.io_history;
> -	char name[KSTAT_STRLEN];
> +	char *name;
>  	kstat_t *ksp;
>  
>  	mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL);
>  
> -	(void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa));
> +	name = kmem_asprintf("zfs/%s", spa_name(spa));
>  
>  	ksp = kstat_create(name, 0, "io", "disk", KSTAT_TYPE_IO, 1, 0);
>  	ssh->kstat = ksp;
> @@ -694,6 +697,7 @@ spa_io_history_init(spa_t *spa)
>  		ksp->ks_update = spa_io_history_update;
>  		kstat_install(ksp);
>  	}
> +	strfree(name);
>  }
>  
>  static void
> @@ -806,7 +810,7 @@ static void
>  spa_mmp_history_init(spa_t *spa)
>  {
>  	spa_stats_history_t *ssh = &spa->spa_stats.mmp_history;
> -	char name[KSTAT_STRLEN];
> +	char *name;
>  	kstat_t *ksp;
>  
>  	mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL);
> @@ -817,7 +821,7 @@ spa_mmp_history_init(spa_t *spa)
>  	ssh->size = 0;
>  	ssh->private = NULL;
>  
> -	(void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa));
> +	name = kmem_asprintf("zfs/%s", spa_name(spa));
>  
>  	ksp = kstat_create(name, 0, "multihost", "misc",
>  	    KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL);
> @@ -832,6 +836,7 @@ spa_mmp_history_init(spa_t *spa)
>  		    spa_mmp_history_data, spa_mmp_history_addr);
>  		kstat_install(ksp);
>  	}
> +	strfree(name);
>  }
>  
>  static void
> diff --git a/zfs/module/zfs/vdev_disk.c b/zfs/module/zfs/vdev_disk.c
> index 5ae50a3..aecc351 100644
> --- a/zfs/module/zfs/vdev_disk.c
> +++ b/zfs/module/zfs/vdev_disk.c
> @@ -98,7 +98,7 @@ static void
>  vdev_disk_error(zio_t *zio)
>  {
>  #ifdef ZFS_DEBUG
> -	printk("ZFS: zio error=%d type=%d offset=%llu size=%llu "
> +	printk(KERN_WARNING "ZFS: zio error=%d type=%d offset=%llu size=%llu "
>  	    "flags=%x\n", zio->io_error, zio->io_type,
>  	    (u_longlong_t)zio->io_offset, (u_longlong_t)zio->io_size,
>  	    zio->io_flags);
> diff --git a/zfs/module/zfs/vdev_mirror.c b/zfs/module/zfs/vdev_mirror.c
> index 0439e4b..d230b4d 100644
> --- a/zfs/module/zfs/vdev_mirror.c
> +++ b/zfs/module/zfs/vdev_mirror.c
> @@ -116,7 +116,8 @@ static const zio_vsd_ops_t vdev_mirror_vsd_ops = {
>  static int
>  vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset)
>  {
> -	uint64_t lastoffset;
> +	uint64_t last_offset;
> +	int64_t offset_diff;
>  	int load;
>  
>  	/* All DVAs have equal weight at the root. */
> @@ -129,13 +130,17 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset)
>  	 * worse overall when resilvering with compared to without.
>  	 */
>  
> +	/* Fix zio_offset for leaf vdevs */
> +	if (vd->vdev_ops->vdev_op_leaf)
> +		zio_offset += VDEV_LABEL_START_SIZE;
> +
>  	/* Standard load based on pending queue length. */
>  	load = vdev_queue_length(vd);
> -	lastoffset = vdev_queue_lastoffset(vd);
> +	last_offset = vdev_queue_last_offset(vd);
>  
>  	if (vd->vdev_nonrot) {
>  		/* Non-rotating media. */
> -		if (lastoffset == zio_offset)
> +		if (last_offset == zio_offset)
>  			return (load + zfs_vdev_mirror_non_rotating_inc);
>  
>  		/*
> @@ -148,16 +153,16 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset)
>  	}
>  
>  	/* Rotating media I/O's which directly follow the last I/O. */
> -	if (lastoffset == zio_offset)
> +	if (last_offset == zio_offset)
>  		return (load + zfs_vdev_mirror_rotating_inc);
>  
>  	/*
>  	 * Apply half the seek increment to I/O's within seek offset
> -	 * of the last I/O queued to this vdev as they should incur less
> +	 * of the last I/O issued to this vdev as they should incur less
>  	 * of a seek increment.
>  	 */
> -	if (ABS(lastoffset - zio_offset) <
> -	    zfs_vdev_mirror_rotating_seek_offset)
> +	offset_diff = (int64_t)(last_offset - zio_offset);
> +	if (ABS(offset_diff) < zfs_vdev_mirror_rotating_seek_offset)
>  		return (load + (zfs_vdev_mirror_rotating_seek_inc / 2));
>  
>  	/* Apply the full seek increment to all other I/O's. */
> @@ -382,29 +387,20 @@ vdev_mirror_child_select(zio_t *zio)
>  		mm->mm_preferred_cnt++;
>  	}
>  
> -	if (mm->mm_preferred_cnt == 1) {
> -		vdev_queue_register_lastoffset(
> -		    mm->mm_child[mm->mm_preferred[0]].mc_vd, zio);
> +	if (mm->mm_preferred_cnt == 1)
>  		return (mm->mm_preferred[0]);
> -	}
>  
> -	if (mm->mm_preferred_cnt > 1) {
> -		int c = vdev_mirror_preferred_child_randomize(zio);
>  
> -		vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, zio);
> -		return (c);
> -	}
> +	if (mm->mm_preferred_cnt > 1)
> +		return (vdev_mirror_preferred_child_randomize(zio));
>  
>  	/*
>  	 * Every device is either missing or has this txg in its DTL.
>  	 * Look for any child we haven't already tried before giving up.
>  	 */
>  	for (c = 0; c < mm->mm_children; c++) {
> -		if (!mm->mm_child[c].mc_tried) {
> -			vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd,
> -			    zio);
> +		if (!mm->mm_child[c].mc_tried)
>  			return (c);
> -		}
>  	}
>  
>  	/*
> diff --git a/zfs/module/zfs/vdev_queue.c b/zfs/module/zfs/vdev_queue.c
> index 6b3e872..40cba34 100644
> --- a/zfs/module/zfs/vdev_queue.c
> +++ b/zfs/module/zfs/vdev_queue.c
> @@ -393,7 +393,7 @@ vdev_queue_init(vdev_t *vd)
>  		    sizeof (zio_t), offsetof(struct zio, io_queue_node));
>  	}
>  
> -	vq->vq_lastoffset = 0;
> +	vq->vq_last_offset = 0;
>  }
>  
>  void
> @@ -699,9 +699,8 @@ vdev_queue_io_to_issue(vdev_queue_t *vq)
>  	 */
>  	tree = vdev_queue_class_tree(vq, p);
>  	vq->vq_io_search.io_timestamp = 0;
> -	vq->vq_io_search.io_offset = vq->vq_last_offset + 1;
> -	VERIFY3P(avl_find(tree, &vq->vq_io_search,
> -	    &idx), ==, NULL);
> +	vq->vq_io_search.io_offset = vq->vq_last_offset - 1;
> +	VERIFY3P(avl_find(tree, &vq->vq_io_search, &idx), ==, NULL);
>  	zio = avl_nearest(tree, idx, AVL_AFTER);
>  	if (zio == NULL)
>  		zio = avl_first(tree);
> @@ -728,7 +727,7 @@ vdev_queue_io_to_issue(vdev_queue_t *vq)
>  	}
>  
>  	vdev_queue_pending_add(vq, zio);
> -	vq->vq_last_offset = zio->io_offset;
> +	vq->vq_last_offset = zio->io_offset + zio->io_size;
>  
>  	return (zio);
>  }
> @@ -806,7 +805,7 @@ vdev_queue_io_done(zio_t *zio)
>  }
>  
>  /*
> - * As these three methods are only used for load calculations we're not
> + * As these two methods are only used for load calculations we're not
>   * concerned if we get an incorrect value on 32bit platforms due to lack of
>   * vq_lock mutex use here, instead we prefer to keep it lock free for
>   * performance.
> @@ -818,15 +817,9 @@ vdev_queue_length(vdev_t *vd)
>  }
>  
>  uint64_t
> -vdev_queue_lastoffset(vdev_t *vd)
> +vdev_queue_last_offset(vdev_t *vd)
>  {
> -	return (vd->vdev_queue.vq_lastoffset);
> -}
> -
> -void
> -vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio)
> -{
> -	vd->vdev_queue.vq_lastoffset = zio->io_offset + zio->io_size;
> +	return (vd->vdev_queue.vq_last_offset);
>  }
>  
>  #if defined(_KERNEL) && defined(HAVE_SPL)
> diff --git a/zfs/module/zfs/zfs_acl.c b/zfs/module/zfs/zfs_acl.c
> index 7ddedea..1fcfca0 100644
> --- a/zfs/module/zfs/zfs_acl.c
> +++ b/zfs/module/zfs/zfs_acl.c
> @@ -1323,6 +1323,7 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx)
>  	sa_bulk_attr_t		bulk[5];
>  	uint64_t		ctime[2];
>  	int			count = 0;
> +	zfs_acl_phys_t		acl_phys;
>  
>  	mode = zp->z_mode;
>  
> @@ -1369,7 +1370,6 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx)
>  	} else { /* Painful legacy way */
>  		zfs_acl_node_t *aclnode;
>  		uint64_t off = 0;
> -		zfs_acl_phys_t acl_phys;
>  		uint64_t aoid;
>  
>  		if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs),
> diff --git a/zfs/module/zfs/zfs_dir.c b/zfs/module/zfs/zfs_dir.c
> index c6ee302..9a8bbcc 100644
> --- a/zfs/module/zfs/zfs_dir.c
> +++ b/zfs/module/zfs/zfs_dir.c
> @@ -977,11 +977,25 @@ zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag,
>   * Indicate whether the directory is empty.  Works with or without z_lock
>   * held, but can only be consider a hint in the latter case.  Returns true
>   * if only "." and ".." remain and there's no work in progress.
> + *
> + * The internal ZAP size, rather than zp->z_size, needs to be checked since
> + * some consumers (Lustre) do not strictly maintain an accurate SA_ZPL_SIZE.
>   */
>  boolean_t
>  zfs_dirempty(znode_t *dzp)
>  {
> -	return (dzp->z_size == 2 && dzp->z_dirlocks == 0);
> +	zfsvfs_t *zfsvfs = ZTOZSB(dzp);
> +	uint64_t count;
> +	int error;
> +
> +	if (dzp->z_dirlocks != NULL)
> +		return (B_FALSE);
> +
> +	error = zap_count(zfsvfs->z_os, dzp->z_id, &count);
> +	if (error != 0 || count != 0)
> +		return (B_FALSE);
> +
> +	return (B_TRUE);
>  }
>  
>  int
> diff --git a/zfs/module/zfs/zfs_fm.c b/zfs/module/zfs/zfs_fm.c
> index 3986b39..1c66ed6 100644
> --- a/zfs/module/zfs/zfs_fm.c
> +++ b/zfs/module/zfs/zfs_fm.c
> @@ -455,8 +455,8 @@ zfs_ereport_start(nvlist_t **ereport_out, nvlist_t **detector_out,
>  
>  typedef struct zfs_ecksum_info {
>  	/* histograms of set and cleared bits by bit number in a 64-bit word */
> -	uint16_t zei_histogram_set[sizeof (uint64_t) * NBBY];
> -	uint16_t zei_histogram_cleared[sizeof (uint64_t) * NBBY];
> +	uint32_t zei_histogram_set[sizeof (uint64_t) * NBBY];
> +	uint32_t zei_histogram_cleared[sizeof (uint64_t) * NBBY];
>  
>  	/* inline arrays of bits set and cleared. */
>  	uint64_t zei_bits_set[ZFM_MAX_INLINE];
> @@ -481,7 +481,7 @@ typedef struct zfs_ecksum_info {
>  } zfs_ecksum_info_t;
>  
>  static void
> -update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count)
> +update_histogram(uint64_t value_arg, uint32_t *hist, uint32_t *count)
>  {
>  	size_t i;
>  	size_t bits = 0;
> @@ -490,8 +490,7 @@ update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count)
>  	/* We store the bits in big-endian (largest-first) order */
>  	for (i = 0; i < 64; i++) {
>  		if (value & (1ull << i)) {
> -			if (hist[63 - i] < UINT16_MAX)
> -				hist[63 - i]++;
> +			hist[63 - i]++;
>  			++bits;
>  		}
>  	}
> @@ -649,6 +648,7 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info,
>  	if (badabd == NULL || goodabd == NULL)
>  		return (eip);
>  
> +	ASSERT3U(nui64s, <=, UINT32_MAX);
>  	ASSERT3U(size, ==, nui64s * sizeof (uint64_t));
>  	ASSERT3U(size, <=, SPA_MAXBLOCKSIZE);
>  	ASSERT3U(size, <=, UINT32_MAX);
> @@ -759,10 +759,10 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info,
>  	} else {
>  		fm_payload_set(ereport,
>  		    FM_EREPORT_PAYLOAD_ZFS_BAD_SET_HISTOGRAM,
> -		    DATA_TYPE_UINT16_ARRAY,
> +		    DATA_TYPE_UINT32_ARRAY,
>  		    NBBY * sizeof (uint64_t), eip->zei_histogram_set,
>  		    FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_HISTOGRAM,
> -		    DATA_TYPE_UINT16_ARRAY,
> +		    DATA_TYPE_UINT32_ARRAY,
>  		    NBBY * sizeof (uint64_t), eip->zei_histogram_cleared,
>  		    NULL);
>  	}
> diff --git a/zfs/module/zfs/zfs_ioctl.c b/zfs/module/zfs/zfs_ioctl.c
> index d195ede..f41e1b9 100644
> --- a/zfs/module/zfs/zfs_ioctl.c
> +++ b/zfs/module/zfs/zfs_ioctl.c
> @@ -3738,9 +3738,12 @@ zfs_ioc_rename(zfs_cmd_t *zc)
>  	boolean_t recursive = zc->zc_cookie & 1;
>  	char *at;
>  
> +	/* "zfs rename" from and to ...%recv datasets should both fail */
> +	zc->zc_name[sizeof (zc->zc_name) - 1] = '\0';
>  	zc->zc_value[sizeof (zc->zc_value) - 1] = '\0';
> -	if (dataset_namecheck(zc->zc_value, NULL, NULL) != 0 ||
> -	    strchr(zc->zc_value, '%'))
> +	if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 ||
> +	    dataset_namecheck(zc->zc_value, NULL, NULL) != 0 ||
> +	    strchr(zc->zc_name, '%') || strchr(zc->zc_value, '%'))
>  		return (SET_ERROR(EINVAL));
>  
>  	at = strchr(zc->zc_name, '@');
> @@ -5002,6 +5005,11 @@ zfs_ioc_promote(zfs_cmd_t *zc)
>  	char *cp;
>  	int error;
>  
> +	zc->zc_name[sizeof (zc->zc_name) - 1] = '\0';
> +	if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 ||
> +	    strchr(zc->zc_name, '%'))
> +		return (SET_ERROR(EINVAL));
> +
>  	error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
>  	if (error != 0)
>  		return (error);
> @@ -5901,20 +5909,26 @@ static int
>  zfs_ioc_pool_sync(const char *pool, nvlist_t *innvl, nvlist_t *onvl)
>  {
>  	int err;
> -	boolean_t force;
> +	boolean_t force = B_FALSE;
>  	spa_t *spa;
>  
>  	if ((err = spa_open(pool, &spa, FTAG)) != 0)
>  		return (err);
>  
> -	force = fnvlist_lookup_boolean_value(innvl, "force");
> +	if (innvl) {
> +		if (nvlist_lookup_boolean_value(innvl, "force", &force) != 0) {
> +			err = SET_ERROR(EINVAL);
> +			goto out;
> +		}
> +	}
> +
>  	if (force) {
>  		spa_config_enter(spa, SCL_CONFIG, FTAG, RW_WRITER);
>  		vdev_config_dirty(spa->spa_root_vdev);
>  		spa_config_exit(spa, SCL_CONFIG, FTAG);
>  	}
>  	txg_wait_synced(spa_get_dsl(spa), 0);
> -
> +out:
>  	spa_close(spa, FTAG);
>  
>  	return (err);
> diff --git a/zfs/module/zfs/zil.c b/zfs/module/zfs/zil.c
> index 4d714ce..1e3e69d 100644
> --- a/zfs/module/zfs/zil.c
> +++ b/zfs/module/zfs/zil.c
> @@ -1009,7 +1009,24 @@ zil_lwb_write_start(zilog_t *zilog, lwb_t *lwb)
>  	 * to clean up in the event of allocation failure or I/O failure.
>  	 */
>  	tx = dmu_tx_create(zilog->zl_os);
> -	VERIFY(dmu_tx_assign(tx, TXG_WAIT) == 0);
> +
> +	/*
> +	 * Since we are not going to create any new dirty data and we can even
> +	 * help with clearing the existing dirty data, we should not be subject
> +	 * to the dirty data based delays.
> +	 * We (ab)use TXG_WAITED to bypass the delay mechanism.
> +	 * One side effect from using TXG_WAITED is that dmu_tx_assign() can
> +	 * fail if the pool is suspended.  Those are dramatic circumstances,
> +	 * so we return NULL to signal that the normal ZIL processing is not
> +	 * possible and txg_wait_synced() should be used to ensure that the data
> +	 * is on disk.
> +	 */
> +	error = dmu_tx_assign(tx, TXG_WAITED);
> +	if (error != 0) {
> +		ASSERT3S(error, ==, EIO);
> +		dmu_tx_abort(tx);
> +		return (NULL);
> +	}
>  	dsl_dataset_dirty(dmu_objset_ds(zilog->zl_os), tx);
>  	txg = dmu_tx_get_txg(tx);
>  
> @@ -1435,8 +1452,7 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg)
>  		return;
>  	}
>  	ASSERT3U(itxg->itxg_txg, <=, synced_txg);
> -	ASSERT(itxg->itxg_txg != 0);
> -	ASSERT(zilog->zl_clean_taskq != NULL);
> +	ASSERT3U(itxg->itxg_txg, !=, 0);
>  	clean_me = itxg->itxg_itxs;
>  	itxg->itxg_itxs = NULL;
>  	itxg->itxg_txg = 0;
> @@ -1447,8 +1463,11 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg)
>  	 * free it in-line. This should be rare. Note, using TQ_SLEEP
>  	 * created a bad performance problem.
>  	 */
> -	if (taskq_dispatch(zilog->zl_clean_taskq,
> -	    (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP) == 0)
> +	ASSERT3P(zilog->zl_dmu_pool, !=, NULL);
> +	ASSERT3P(zilog->zl_dmu_pool->dp_zil_clean_taskq, !=, NULL);
> +	taskqid_t id = taskq_dispatch(zilog->zl_dmu_pool->dp_zil_clean_taskq,
> +	    (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP);
> +	if (id == TASKQID_INVALID)
>  		zil_itxg_clean(clean_me);
>  }
>  
> @@ -1921,13 +1940,10 @@ zil_open(objset_t *os, zil_get_data_t *get_data)
>  {
>  	zilog_t *zilog = dmu_objset_zil(os);
>  
> -	ASSERT(zilog->zl_clean_taskq == NULL);
>  	ASSERT(zilog->zl_get_data == NULL);
>  	ASSERT(list_is_empty(&zilog->zl_lwb_list));
>  
>  	zilog->zl_get_data = get_data;
> -	zilog->zl_clean_taskq = taskq_create("zil_clean", 1, defclsyspri,
> -	    2, 2, TASKQ_PREPOPULATE);
>  
>  	return (zilog);
>  }
> @@ -1962,8 +1978,6 @@ zil_close(zilog_t *zilog)
>  	if (txg < spa_freeze_txg(zilog->zl_spa))
>  		VERIFY(!zilog_is_dirty(zilog));
>  
> -	taskq_destroy(zilog->zl_clean_taskq);
> -	zilog->zl_clean_taskq = NULL;
>  	zilog->zl_get_data = NULL;
>  
>  	/*
> diff --git a/zfs/module/zfs/zle.c b/zfs/module/zfs/zle.c
> index 13c5673..613607f 100644
> --- a/zfs/module/zfs/zle.c
> +++ b/zfs/module/zfs/zle.c
> @@ -74,10 +74,14 @@ zle_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n)
>  	while (src < s_end && dst < d_end) {
>  		int len = 1 + *src++;
>  		if (len <= n) {
> +			if (src + len > s_end || dst + len > d_end)
> +				return (-1);
>  			while (len-- != 0)
>  				*dst++ = *src++;
>  		} else {
>  			len -= n;
> +			if (dst + len > d_end)
> +				return (-1);
>  			while (len-- != 0)
>  				*dst++ = 0;
>  		}
> diff --git a/zfs/module/zfs/zvol.c b/zfs/module/zfs/zvol.c
> index 5293f95..5b62bf94 100644
> --- a/zfs/module/zfs/zvol.c
> +++ b/zfs/module/zfs/zvol.c
> @@ -1347,9 +1347,9 @@ zvol_open(struct block_device *bdev, fmode_t flag)
>  {
>  	zvol_state_t *zv;
>  	int error = 0;
> -	boolean_t drop_suspend = B_FALSE;
> +	boolean_t drop_suspend = B_TRUE;
>  
> -	ASSERT(!mutex_owned(&zvol_state_lock));
> +	ASSERT(!MUTEX_HELD(&zvol_state_lock));
>  
>  	mutex_enter(&zvol_state_lock);
>  	/*
> @@ -1364,23 +1364,31 @@ zvol_open(struct block_device *bdev, fmode_t flag)
>  		return (SET_ERROR(-ENXIO));
>  	}
>  
> -	/* take zv_suspend_lock before zv_state_lock */
> -	rw_enter(&zv->zv_suspend_lock, RW_READER);
> -
>  	mutex_enter(&zv->zv_state_lock);
> -
>  	/*
>  	 * make sure zvol is not suspended during first open
> -	 * (hold zv_suspend_lock), otherwise, drop the lock
> +	 * (hold zv_suspend_lock) and respect proper lock acquisition
> +	 * ordering - zv_suspend_lock before zv_state_lock
>  	 */
>  	if (zv->zv_open_count == 0) {
> -		drop_suspend = B_TRUE;
> +		if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) {
> +			mutex_exit(&zv->zv_state_lock);
> +			rw_enter(&zv->zv_suspend_lock, RW_READER);
> +			mutex_enter(&zv->zv_state_lock);
> +			/* check to see if zv_suspend_lock is needed */
> +			if (zv->zv_open_count != 0) {
> +				rw_exit(&zv->zv_suspend_lock);
> +				drop_suspend = B_FALSE;
> +			}
> +		}
>  	} else {
> -		rw_exit(&zv->zv_suspend_lock);
> +		drop_suspend = B_FALSE;
>  	}
> -
>  	mutex_exit(&zvol_state_lock);
>  
> +	ASSERT(MUTEX_HELD(&zv->zv_state_lock));
> +	ASSERT(zv->zv_open_count != 0 || RW_READ_HELD(&zv->zv_suspend_lock));
> +
>  	if (zv->zv_open_count == 0) {
>  		error = zvol_first_open(zv);
>  		if (error)
> @@ -1417,28 +1425,38 @@ static int
>  zvol_release(struct gendisk *disk, fmode_t mode)
>  {
>  	zvol_state_t *zv;
> -	boolean_t drop_suspend = B_FALSE;
> +	boolean_t drop_suspend = B_TRUE;
>  
> -	ASSERT(!mutex_owned(&zvol_state_lock));
> +	ASSERT(!MUTEX_HELD(&zvol_state_lock));
>  
>  	mutex_enter(&zvol_state_lock);
>  	zv = disk->private_data;
> -	ASSERT(zv && zv->zv_open_count > 0);
> -
> -	/* take zv_suspend_lock before zv_state_lock */
> -	rw_enter(&zv->zv_suspend_lock, RW_READER);
>  
>  	mutex_enter(&zv->zv_state_lock);
> -	mutex_exit(&zvol_state_lock);
> -
> +	ASSERT(zv->zv_open_count > 0);
>  	/*
>  	 * make sure zvol is not suspended during last close
> -	 * (hold zv_suspend_lock), otherwise, drop the lock
> +	 * (hold zv_suspend_lock) and respect proper lock acquisition
> +	 * ordering - zv_suspend_lock before zv_state_lock
>  	 */
> -	if (zv->zv_open_count == 1)
> -		drop_suspend = B_TRUE;
> -	else
> -		rw_exit(&zv->zv_suspend_lock);
> +	if (zv->zv_open_count == 1) {
> +		if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) {
> +			mutex_exit(&zv->zv_state_lock);
> +			rw_enter(&zv->zv_suspend_lock, RW_READER);
> +			mutex_enter(&zv->zv_state_lock);
> +			/* check to see if zv_suspend_lock is needed */
> +			if (zv->zv_open_count != 1) {
> +				rw_exit(&zv->zv_suspend_lock);
> +				drop_suspend = B_FALSE;
> +			}
> +		}
> +	} else {
> +		drop_suspend = B_FALSE;
> +	}
> +	mutex_exit(&zvol_state_lock);
> +
> +	ASSERT(MUTEX_HELD(&zv->zv_state_lock));
> +	ASSERT(zv->zv_open_count != 1 || RW_READ_HELD(&zv->zv_suspend_lock));
>  
>  	zv->zv_open_count--;
>  	if (zv->zv_open_count == 0)
> 
Is is possible to get this fix in on the next Bionic release?




More information about the kernel-team mailing list