[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg
Andres Rodriguez
andreserl at ubuntu-pe.org
Tue Feb 6 04:19:14 UTC 2018
>
>
> > That being said, because CPU load doesn't show high we are making the
> > *assumption* that it is not impacting MAAS, but again, this is an
> > assumption. Making the requested change for having at least 4 CPUs
> (ideally
> > 6) would allow us to determining what are the effects and see whether
> > there's any difference on behavior and would help identify what other
> > issues.
> >
> > Without having the comparison then we are making it more difficult to
> > isolate the problem.
>
> To improve performance the typical pattern is 1) identify the
> bottleneck 2) eliminate that as the bottleneck 3) repeat.
>
> We have not identified CPU as a bottleneck. The top data says it is
> not!
>
Jason,
That doesn't change the fact that we are requesting tests to be run with
different CPU configuration for VM's, so we can make a *comparison* and see
if there is any material difference or none at all with the current
conditions. While I agree with you that the data /seems/ to show that there
is not issue with CPU, that doesn't change the fact that we don't have any
data to compare with, as there could still be an impact even if it is
minimum.
Without the data, we cannot certainly assert that there's no issue caused
by CPU usage because we don't have a reference or point of comparison. So
while all fingers seem to be pointing to storage, It strongly believe it is
worth gathering the data now and fully discard.
If this is something that your environment is unable to do, I would
appreciate that you clarify that instead of asserting that there's no
performance impact in MAAS due to CPU usage, when we don't really know for
sure (e.g. we don't know if MAAS behaves differently with less CPU usage in
the current conditions, and that's data worth gathering to be able to
better support you in the future).
--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1743249
Title:
Failed Deployment after timeout trying to retrieve grub cfg
Status in MAAS:
New
Status in grub2 package in Ubuntu:
In Progress
Bug description:
A node failed to deploy after it failed to retrieve a grub.cfg from
MAAS due to a timeout. In the logs, it's clear that the server tried
to retrieve the grub cfg many times, over about 30 seconds:
http://paste.ubuntu.com/26387256/
We see the same thing for other hosts around the same time:
http://paste.ubuntu.com/26387262/
It seems like MAAS is taking way too long to respond to these
requests.
This is very similar to bug 1724677, which was happening pre-
metldown/spectre. The only difference is we don't see "[critical] TFTP
back-end failed" in the logs anymore.
I connected to the console on this system and it had errors about
timing out retrieving the grub-cfg, then it had an error message along
the lines of "error not an ip" and then "double free". After I
connected but before I could get a screenshot the system rebooted and
was directed by maas to power off, which it did successfully after
booting to linux.
Full logs are available here:
https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
ed277a020e7c/cpe_cloud_395/infra-logs.tar
This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions
More information about the foundations-bugs
mailing list