ACK:[SRU]][Bionic][PATCH 1/1] ocxl: Fix page fault handler in case of fault on dying process

Tue Jul 31 10:49:47 UTC 2018

On 07/30/18 18:31, Joseph Salisbury wrote:
> From: Frederic Barrat <fbarrat at linux.ibm.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1781436
> 
> If a process exits without doing proper cleanup, there's a window
> where an opencapi device can try to access the memory of the dying
> process and may trigger a page fault. That's an expected scenario and
> the ocxl driver holds a reference on the mm_struct of the process
> until the opencapi device is notified of the process exiting.
> However, if mm_users is already at 0, i.e. the address space of the
> process has already been destroyed, the driver shouldn't try resolving
> the page fault, as it will fail, but it can also try accessing already
> freed data.
> 
> It is fixed by only calling the bottom half of the page fault handler
> if mm_users is greater than 0 and get a reference on mm_users instead
> of mm_count. Otherwise, we can safely return a translation fault to
> the device, as its associated memory context is being removed. The
> opencapi device will be properly cleaned up shortly after when closing
> the file descriptors.
> 
> Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi devices")
> Cc: stable at vger.kernel.org # v4.16+
> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
> Reviewed-By: Alastair D'Silva <alastair at d-silva.org>
> Acked-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> (cherry picked from linux-next commit d497ebf5fb3a026c0817f8c96cde578787f24093)
> Signed-off-by: Joseph Salisbury <joseph.salisbury at canonical.com>

Given the "cherry picked" line is fixed as Stefan mentioned:

Acked-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>

> ---
>  drivers/misc/ocxl/link.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 88876ae8f330..a963b0a4a3c5 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -136,7 +136,7 @@ static void xsl_fault_handler_bh(struct work_struct *fault_work)
>  	int rc;
>  
>  	/*
> -	 * We need to release a reference on the mm whenever exiting this
> +	 * We must release a reference on mm_users whenever exiting this
>  	 * function (taken in the memory fault interrupt handler)
>  	 */
>  	rc = copro_handle_mm_fault(fault->pe_data.mm, fault->dar, fault->dsisr,
> @@ -172,7 +172,7 @@ static void xsl_fault_handler_bh(struct work_struct *fault_work)
>  	}
>  	r = RESTART;
>  ack:
> -	mmdrop(fault->pe_data.mm);
> +	mmput(fault->pe_data.mm);
>  	ack_irq(spa, r);
>  }
>  
> @@ -184,6 +184,7 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
>  	struct pe_data *pe_data;
>  	struct ocxl_process_element *pe;
>  	int lpid, pid, tid;
> +	bool schedule = false;
>  
>  	read_irq(spa, &dsisr, &dar, &pe_handle);
>  	trace_ocxl_fault(spa->spa_mem, pe_handle, dsisr, dar, -1);
> @@ -226,14 +227,19 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
>  	}
>  	WARN_ON(pe_data->mm->context.id != pid);
>  
> -	spa->xsl_fault.pe = pe_handle;
> -	spa->xsl_fault.dar = dar;
> -	spa->xsl_fault.dsisr = dsisr;
> -	spa->xsl_fault.pe_data = *pe_data;
> -	mmgrab(pe_data->mm); /* mm count is released by bottom half */
> -
> +	if (mmget_not_zero(pe_data->mm)) {
> +			spa->xsl_fault.pe = pe_handle;
> +			spa->xsl_fault.dar = dar;
> +			spa->xsl_fault.dsisr = dsisr;
> +			spa->xsl_fault.pe_data = *pe_data;
> +			schedule = true;
> +			/* mm_users count released by bottom half */
> +	}
>  	rcu_read_unlock();
> -	schedule_work(&spa->xsl_fault.fault_work);
> +	if (schedule)
> +		schedule_work(&spa->xsl_fault.fault_work);
> +	else
> +		ack_irq(spa, ADDRESS_ERROR);
>  	return IRQ_HANDLED;
>  }
>  
>