[PATCH] opal: prd_info: Add resilience to service check

Deb McLemore debmc at linux.vnet.ibm.com
Wed May 2 21:28:16 UTC 2018


Hi Alex, the patch is good, there was the fwts_pipeio regression patch

which fixed the issue which surfaced this, but I think the resilience

is good anyway.

https://lists.ubuntu.com/archives/fwts-devel/2018-April/010348.html


On 05/02/2018 02:38 PM, Alex Hung wrote:
> On Mon, Apr 9, 2018 at 6:07 AM, Deb McLemore <debmc at linux.vnet.ibm.com> wrote:
>> Just an update on this, narrowing this down to the Host OS (Ubuntu 16.04)
>>
>> has different levels of opal-prd daemon.  So far it seems that some
>>
>> changes to the fwts_pipe_readwrite does not return some socket info that it use to
>>
>> and so maybe different paths.  There is a fix we can do to properly
>>
>> only look at the return code from the child exit process (fwts_pipe_close2) on the case
>>
>> where there is no socket data coming back on the systemctl stop command and not the
>>
>> output buffer of the socket handling, but really need to look deeper to
>>
>> see the underlying issue more clearly, but I wanted to update the mailing
>>
>> list.
> Hi Deb,
>
> Are we expecting an updated patch for this or do you think this patch
> is in a good shape?
>
> There was no FWTS 18.04.00 but there will be 18.05.00 in two weeks
> (hopefully). If everybody agrees, this should be included in 18.05.00.
>
>>
>> $ opal-prd --version
>> opal-prd opal-prd-5.1.13
>>
>>
>> $ opal-prd --version
>> opal-prd opal-prd-5.4.3
>>
>>
>> On 04/07/2018 01:41 PM, Deborah McLemore wrote:
>>> The case I reproduced was manually running the "fwts prd_info" and all it does
>>> is a 'systemd status', then if 'running', 'systemd stop'.  The 'systemd stop'
>>> fails with -1.
>>> It works ok on some levels of Ubuntu and others not, I will do more
>>> investigation to see the root differences, but the proposed enhancement
>>> is a good one to ignore 'systemd stop' exit status since we did get a successful
>>> status of 'running' from the 'systemd status' query.
>>> The 'systemd stop' functionally works (the service is stopped), its just the
>>> exit status from the 'systemd stop' which is the -1 on some OS's.  We should be
>>> more resilient.  We only attempt to 'systemd start' after the test runs if we
>>> had determined that we were 'running' and tried the 'systemd stop', so its not
>>> so quick, but possibly.
>>> =====================================
>>> Deb McLemore
>>> IBM OpenPower - IBM Systems
>>> (512) 286 9980
>>>
>>> debmc at us.ibm.com
>>> debmc at linux.vnet.ibm.com - (plain text)
>>> =====================================
>>>
>>>     ----- Original message -----
>>>     From: ppaidipe <ppaidipe at linux.vnet.ibm.com>
>>>     To: Deborah McLemore/Austin/IBM at IBMUS
>>>     Cc: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>, Deb McLemore
>>>     <debmc at linux.vnet.ibm.com>, fwts-devel at lists.ubuntu.com
>>>     Subject: Re: [PATCH] opal: prd_info: Add resilience to service check
>>>     Date: Sat, Apr 7, 2018 1:16 PM
>>>     On 2018-04-07 20:50, Deborah McLemore wrote:
>>>      > We are getting -1 back, what is the expected exit status from systemd
>>>      > stop ?
>>>      >
>>>
>>>       From the execution of test what i understand is we are requesting
>>>     start/stop
>>>     the service too quickly which made the test fail.
>>>
>>>     Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Start request
>>>     repeated too quickly.
>>>     Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Failed with
>>>     result 'start-limit-hit'.
>>>     Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: Failed to start OPAL PRD daemon.
>>>
>>>     So we need to request start/restart only when it is done with stop, and
>>>     also request for stop
>>>     only when the daemon is already started.
>>>
>>>
>>>     Thanks
>>>     Pridhiviraj
>>>
>>>      > Sent from my iPhone
>>>      >
>>>      >> On Apr 7, 2018, at 9:23 AM, Vasant Hegde
>>>      > <hegdevasant at linux.vnet.ibm.com> wrote:
>>>      >>
>>>      >>> On 04/07/2018 07:40 PM, Deb McLemore wrote:
>>>      >>> When the opal-prd.service is running and attempt to stop is
>>>      >>> performed, ignore the exit status and continue.
>>>      >>
>>>      >> Deb,
>>>      >>
>>>      >> Can you please explain why do you want to ignore exit status here?
>>>      >> Is there any issues?
>>>      >>
>>>      >> -Vasant
>>>      >>
>>>      >>
>>>      >>
>>>      >>>
>>>      >>> Signed-off-by: Deb McLemore <debmc at linux.vnet.ibm.com>
>>>      >>> ---
>>>      >>> src/opal/prd_info.c | 20 ++++----------------
>>>      >>> 1 file changed, 4 insertions(+), 16 deletions(-)
>>>      >>>
>>>      >>> diff --git a/src/opal/prd_info.c b/src/opal/prd_info.c
>>>      >>> index 4082a18..2db9413 100644
>>>      >>> --- a/src/opal/prd_info.c
>>>      >>> +++ b/src/opal/prd_info.c
>>>      >>> @@ -73,7 +73,7 @@ static int prd_dev_query(fwts_framework *fw)
>>>      >>>
>>>      >>> static int prd_service_check(fwts_framework *fw, int *restart)
>>>      >>> {
>>>      >>> - int rc = FWTS_OK, status = 0, stop_status = 0;
>>>      >>> + int rc = FWTS_OK, status = 0;
>>>      >>> char *command;
>>>      >>> char *output = NULL;
>>>      >>>
>>>      >>> @@ -97,25 +97,13 @@ static int prd_service_check(fwts_framework
>>>      > *fw, int *restart)
>>>      >>> goto out;
>>>      >>> case 0: /* "running" */
>>>      >>> command = "systemctl stop opal-prd.service 2>&1";
>>>      >>> - stop_status = fwts_exec2(command, &output);
>>>      >>> + fwts_exec2(command, &output);
>>>      >>>
>>>      >>> if (output)
>>>      >>> free(output);
>>>      >>>
>>>      >>> - switch (stop_status) {
>>>      >>> - case 0:
>>>      >>> - *restart = 1;
>>>      >>> - break;
>>>      >>> - default:
>>>      >>> - fwts_failed(fw, LOG_LEVEL_HIGH, "OPAL PRD Info",
>>>      >>> - "Attempt was made to stop the "
>>>      >>> - "opal-prd.service but was not "
>>>      >>> - "successful. Try to "
>>>      >>> - ""sudo systemctl stop "
>>>      >>> - "opal-prd.service" and retry.");
>>>      >>> - rc = FWTS_ERROR;
>>>      >>> - goto out;
>>>      >>> - }
>>>      >>> + *restart = 1;
>>>      >>> + break;
>>>      >>> default:
>>>      >>> break;
>>>      >>> }
>>>      >>>
>>>      >>
>>>      >>
>>>      >> --
>>>      >> fwts-devel mailing list
>>>      >> fwts-devel at lists.ubuntu.com
>>>      >> Modify settings or unsubscribe at:
>>>      >
>>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>>      > [1]
>>>      >>
>>>      >
>>>      >
>>>      >
>>>      > Links:
>>>      > ------
>>>      > [1]
>>>      >
>>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>>
>>>
>>
>> --
>> fwts-devel mailing list
>> fwts-devel at lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/fwts-devel
>
>




More information about the fwts-devel mailing list