[PATCH] opal: prd_info: Add resilience to service check
Deb McLemore
debmc at linux.vnet.ibm.com
Wed May 2 21:28:16 UTC 2018
Hi Alex, the patch is good, there was the fwts_pipeio regression patch
which fixed the issue which surfaced this, but I think the resilience
is good anyway.
https://lists.ubuntu.com/archives/fwts-devel/2018-April/010348.html
On 05/02/2018 02:38 PM, Alex Hung wrote:
> On Mon, Apr 9, 2018 at 6:07 AM, Deb McLemore <debmc at linux.vnet.ibm.com> wrote:
>> Just an update on this, narrowing this down to the Host OS (Ubuntu 16.04)
>>
>> has different levels of opal-prd daemon. So far it seems that some
>>
>> changes to the fwts_pipe_readwrite does not return some socket info that it use to
>>
>> and so maybe different paths. There is a fix we can do to properly
>>
>> only look at the return code from the child exit process (fwts_pipe_close2) on the case
>>
>> where there is no socket data coming back on the systemctl stop command and not the
>>
>> output buffer of the socket handling, but really need to look deeper to
>>
>> see the underlying issue more clearly, but I wanted to update the mailing
>>
>> list.
> Hi Deb,
>
> Are we expecting an updated patch for this or do you think this patch
> is in a good shape?
>
> There was no FWTS 18.04.00 but there will be 18.05.00 in two weeks
> (hopefully). If everybody agrees, this should be included in 18.05.00.
>
>>
>> $ opal-prd --version
>> opal-prd opal-prd-5.1.13
>>
>>
>> $ opal-prd --version
>> opal-prd opal-prd-5.4.3
>>
>>
>> On 04/07/2018 01:41 PM, Deborah McLemore wrote:
>>> The case I reproduced was manually running the "fwts prd_info" and all it does
>>> is a 'systemd status', then if 'running', 'systemd stop'. The 'systemd stop'
>>> fails with -1.
>>> It works ok on some levels of Ubuntu and others not, I will do more
>>> investigation to see the root differences, but the proposed enhancement
>>> is a good one to ignore 'systemd stop' exit status since we did get a successful
>>> status of 'running' from the 'systemd status' query.
>>> The 'systemd stop' functionally works (the service is stopped), its just the
>>> exit status from the 'systemd stop' which is the -1 on some OS's. We should be
>>> more resilient. We only attempt to 'systemd start' after the test runs if we
>>> had determined that we were 'running' and tried the 'systemd stop', so its not
>>> so quick, but possibly.
>>> =====================================
>>> Deb McLemore
>>> IBM OpenPower - IBM Systems
>>> (512) 286 9980
>>>
>>> debmc at us.ibm.com
>>> debmc at linux.vnet.ibm.com - (plain text)
>>> =====================================
>>>
>>> ----- Original message -----
>>> From: ppaidipe <ppaidipe at linux.vnet.ibm.com>
>>> To: Deborah McLemore/Austin/IBM at IBMUS
>>> Cc: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>, Deb McLemore
>>> <debmc at linux.vnet.ibm.com>, fwts-devel at lists.ubuntu.com
>>> Subject: Re: [PATCH] opal: prd_info: Add resilience to service check
>>> Date: Sat, Apr 7, 2018 1:16 PM
>>> On 2018-04-07 20:50, Deborah McLemore wrote:
>>> > We are getting -1 back, what is the expected exit status from systemd
>>> > stop ?
>>> >
>>>
>>> From the execution of test what i understand is we are requesting
>>> start/stop
>>> the service too quickly which made the test fail.
>>>
>>> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Start request
>>> repeated too quickly.
>>> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Failed with
>>> result 'start-limit-hit'.
>>> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: Failed to start OPAL PRD daemon.
>>>
>>> So we need to request start/restart only when it is done with stop, and
>>> also request for stop
>>> only when the daemon is already started.
>>>
>>>
>>> Thanks
>>> Pridhiviraj
>>>
>>> > Sent from my iPhone
>>> >
>>> >> On Apr 7, 2018, at 9:23 AM, Vasant Hegde
>>> > <hegdevasant at linux.vnet.ibm.com> wrote:
>>> >>
>>> >>> On 04/07/2018 07:40 PM, Deb McLemore wrote:
>>> >>> When the opal-prd.service is running and attempt to stop is
>>> >>> performed, ignore the exit status and continue.
>>> >>
>>> >> Deb,
>>> >>
>>> >> Can you please explain why do you want to ignore exit status here?
>>> >> Is there any issues?
>>> >>
>>> >> -Vasant
>>> >>
>>> >>
>>> >>
>>> >>>
>>> >>> Signed-off-by: Deb McLemore <debmc at linux.vnet.ibm.com>
>>> >>> ---
>>> >>> src/opal/prd_info.c | 20 ++++----------------
>>> >>> 1 file changed, 4 insertions(+), 16 deletions(-)
>>> >>>
>>> >>> diff --git a/src/opal/prd_info.c b/src/opal/prd_info.c
>>> >>> index 4082a18..2db9413 100644
>>> >>> --- a/src/opal/prd_info.c
>>> >>> +++ b/src/opal/prd_info.c
>>> >>> @@ -73,7 +73,7 @@ static int prd_dev_query(fwts_framework *fw)
>>> >>>
>>> >>> static int prd_service_check(fwts_framework *fw, int *restart)
>>> >>> {
>>> >>> - int rc = FWTS_OK, status = 0, stop_status = 0;
>>> >>> + int rc = FWTS_OK, status = 0;
>>> >>> char *command;
>>> >>> char *output = NULL;
>>> >>>
>>> >>> @@ -97,25 +97,13 @@ static int prd_service_check(fwts_framework
>>> > *fw, int *restart)
>>> >>> goto out;
>>> >>> case 0: /* "running" */
>>> >>> command = "systemctl stop opal-prd.service 2>&1";
>>> >>> - stop_status = fwts_exec2(command, &output);
>>> >>> + fwts_exec2(command, &output);
>>> >>>
>>> >>> if (output)
>>> >>> free(output);
>>> >>>
>>> >>> - switch (stop_status) {
>>> >>> - case 0:
>>> >>> - *restart = 1;
>>> >>> - break;
>>> >>> - default:
>>> >>> - fwts_failed(fw, LOG_LEVEL_HIGH, "OPAL PRD Info",
>>> >>> - "Attempt was made to stop the "
>>> >>> - "opal-prd.service but was not "
>>> >>> - "successful. Try to "
>>> >>> - ""sudo systemctl stop "
>>> >>> - "opal-prd.service" and retry.");
>>> >>> - rc = FWTS_ERROR;
>>> >>> - goto out;
>>> >>> - }
>>> >>> + *restart = 1;
>>> >>> + break;
>>> >>> default:
>>> >>> break;
>>> >>> }
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> fwts-devel mailing list
>>> >> fwts-devel at lists.ubuntu.com
>>> >> Modify settings or unsubscribe at:
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>> > [1]
>>> >>
>>> >
>>> >
>>> >
>>> > Links:
>>> > ------
>>> > [1]
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>>>
>>>
>>
>> --
>> fwts-devel mailing list
>> fwts-devel at lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/fwts-devel
>
>
More information about the fwts-devel
mailing list