[PATCH] opal: prd_info: Add resilience to service check
Deb McLemore
debmc at linux.vnet.ibm.com
Mon Apr 9 13:07:57 UTC 2018
Just an update on this, narrowing this down to the Host OS (Ubuntu 16.04)
has different levels of opal-prd daemon. So far it seems that some
changes to the fwts_pipe_readwrite does not return some socket info that it use to
and so maybe different paths. There is a fix we can do to properly
only look at the return code from the child exit process (fwts_pipe_close2) on the case
where there is no socket data coming back on the systemctl stop command and not the
output buffer of the socket handling, but really need to look deeper to
see the underlying issue more clearly, but I wanted to update the mailing
list.
$ opal-prd --version
opal-prd opal-prd-5.1.13
$ opal-prd --version
opal-prd opal-prd-5.4.3
On 04/07/2018 01:41 PM, Deborah McLemore wrote:
> The case I reproduced was manually running the "fwts prd_info" and all it does
> is a 'systemd status', then if 'running', 'systemd stop'. The 'systemd stop'
> fails with -1.
> It works ok on some levels of Ubuntu and others not, I will do more
> investigation to see the root differences, but the proposed enhancement
> is a good one to ignore 'systemd stop' exit status since we did get a successful
> status of 'running' from the 'systemd status' query.
> The 'systemd stop' functionally works (the service is stopped), its just the
> exit status from the 'systemd stop' which is the -1 on some OS's. We should be
> more resilient. We only attempt to 'systemd start' after the test runs if we
> had determined that we were 'running' and tried the 'systemd stop', so its not
> so quick, but possibly.
> =====================================
> Deb McLemore
> IBM OpenPower - IBM Systems
> (512) 286 9980
>
> debmc at us.ibm.com
> debmc at linux.vnet.ibm.com - (plain text)
> =====================================
>
> ----- Original message -----
> From: ppaidipe <ppaidipe at linux.vnet.ibm.com>
> To: Deborah McLemore/Austin/IBM at IBMUS
> Cc: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>, Deb McLemore
> <debmc at linux.vnet.ibm.com>, fwts-devel at lists.ubuntu.com
> Subject: Re: [PATCH] opal: prd_info: Add resilience to service check
> Date: Sat, Apr 7, 2018 1:16 PM
> On 2018-04-07 20:50, Deborah McLemore wrote:
> > We are getting -1 back, what is the expected exit status from systemd
> > stop ?
> >
>
> From the execution of test what i understand is we are requesting
> start/stop
> the service too quickly which made the test fail.
>
> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Start request
> repeated too quickly.
> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: opal-prd.service: Failed with
> result 'start-limit-hit'.
> Apr 07 13:11:18 xxxxxxxxxxx systemd[1]: Failed to start OPAL PRD daemon.
>
> So we need to request start/restart only when it is done with stop, and
> also request for stop
> only when the daemon is already started.
>
>
> Thanks
> Pridhiviraj
>
> > Sent from my iPhone
> >
> >> On Apr 7, 2018, at 9:23 AM, Vasant Hegde
> > <hegdevasant at linux.vnet.ibm.com> wrote:
> >>
> >>> On 04/07/2018 07:40 PM, Deb McLemore wrote:
> >>> When the opal-prd.service is running and attempt to stop is
> >>> performed, ignore the exit status and continue.
> >>
> >> Deb,
> >>
> >> Can you please explain why do you want to ignore exit status here?
> >> Is there any issues?
> >>
> >> -Vasant
> >>
> >>
> >>
> >>>
> >>> Signed-off-by: Deb McLemore <debmc at linux.vnet.ibm.com>
> >>> ---
> >>> src/opal/prd_info.c | 20 ++++----------------
> >>> 1 file changed, 4 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/src/opal/prd_info.c b/src/opal/prd_info.c
> >>> index 4082a18..2db9413 100644
> >>> --- a/src/opal/prd_info.c
> >>> +++ b/src/opal/prd_info.c
> >>> @@ -73,7 +73,7 @@ static int prd_dev_query(fwts_framework *fw)
> >>>
> >>> static int prd_service_check(fwts_framework *fw, int *restart)
> >>> {
> >>> - int rc = FWTS_OK, status = 0, stop_status = 0;
> >>> + int rc = FWTS_OK, status = 0;
> >>> char *command;
> >>> char *output = NULL;
> >>>
> >>> @@ -97,25 +97,13 @@ static int prd_service_check(fwts_framework
> > *fw, int *restart)
> >>> goto out;
> >>> case 0: /* "running" */
> >>> command = "systemctl stop opal-prd.service 2>&1";
> >>> - stop_status = fwts_exec2(command, &output);
> >>> + fwts_exec2(command, &output);
> >>>
> >>> if (output)
> >>> free(output);
> >>>
> >>> - switch (stop_status) {
> >>> - case 0:
> >>> - *restart = 1;
> >>> - break;
> >>> - default:
> >>> - fwts_failed(fw, LOG_LEVEL_HIGH, "OPAL PRD Info",
> >>> - "Attempt was made to stop the "
> >>> - "opal-prd.service but was not "
> >>> - "successful. Try to "
> >>> - ""sudo systemctl stop "
> >>> - "opal-prd.service" and retry.");
> >>> - rc = FWTS_ERROR;
> >>> - goto out;
> >>> - }
> >>> + *restart = 1;
> >>> + break;
> >>> default:
> >>> break;
> >>> }
> >>>
> >>
> >>
> >> --
> >> fwts-devel mailing list
> >> fwts-devel at lists.ubuntu.com
> >> Modify settings or unsubscribe at:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
> > [1]
> >>
> >
> >
> >
> > Links:
> > ------
> > [1]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_fwts-2Ddevel&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=V3KRDPsp3yMosW9R4elWYg&m=Sy-O20yWd_N3piZoJOEzigB1XzmLV4OUCfEyl3ENAcc&s=oPh1ACx1NGTgif-0V5BIQffXXqjymI8QC_bagI2jZsA&e=
>
>
More information about the fwts-devel
mailing list