Rule #2 should die

Martin Packman martin.packman at canonical.com
Thu Jun 4 21:42:47 UTC 2015


Currently juju-reports has a rule matching on failures where our CI
harness interrupted the test because it took too long:

<http://reports.vapour.ws/releases/rules/2>

This seems too generic a symptom, generally if a test is not
completing within the time we've allocated for it, there's another
indication in the log, often the final `juju status` output, that
makes it clearer why juju never finished its work.

Checking over the recent matches for rule #2:

<http://reports.vapour.ws/releases/2733/job/gce-deploy-trusty-amd64/attempt/265>

  "2":
    agent-state-info: 'sending new instance request: GCE operation
"operation-1433442378645-517b54fc8fe09-e1a9dd29-092d6cff"
      failed'
    instance-id: pending

GCE failed to give us an instance.


<http://reports.vapour.ws/releases/2732/job/canonistack-deploy-trusty-amd64/attempt/3092>

  "1":
    agent-state: pending
    dns-name: 10.55.32.175
    instance-id: ee22b864-47e9-4931-8bcb-92bbbe08f05e
    instance-state: ACTIVE

<http://data.vapour.ws/juju-ci/products/version-2732/canonistack-deploy-trusty-amd64/build-3092/machine-1/cloud-init.log.gz>

    Jun  4 14:57:24 juju-canonistack-deploy-trusty-amd64-machine-1
[CLOUDINIT] util.py[DEBUG]: Running command ['eatmydata', 'apt-get',
'--option=Dpkg::Options::=--force-confold',
'--option=Dpkg::options::=--force-unsafe-io', '--assume-yes',
'--quiet', 'install', 'curl', 'cpu-checker', 'bridge-utils',
'rsyslog-gnutls', 'cloud-utils', 'cloud-image-utils', 'tmux'] with
allowed return codes [0] (shell=False, capture=False)

Super slow canonistack machine, still crawling along installing
packages when we gave up.


<http://reports.vapour.ws/releases/2732/job/functional-backup-restore/attempt/2702>

error: cannot re-bootstrap environment: cannot bootstrap new instance:
waited for 10m0s without being able to connect: ssh: connect to host
10.0.0.247 port 22: Connection timed out

Not the best log, but seems clear we never got a usable bootstrap
machine to restore into.


<http://reports.vapour.ws/releases/2732/job/joyent-deploy-precise-amd64/attempt/2145>

  "1":
    agent-state: pending
    dns-name: 165.225.128.214
    instance-id: fc67a2b4-00ab-4571-e947-ebd68fd54f9b
    instance-state: running

<http://data.vapour.ws/juju-ci/products/version-2732/joyent-deploy-precise-amd64/build-2145/machine-1/cloud-init-output.log.gz>

    Attempt 1 to download tools from
https://10.112.2.15:17070/tools/1.25-alpha1-precise-amd64...
    + curl -sSfw tools from %{url_effective} downloaded: HTTP
%{http_code}; time %{time_total}s; size %{size_download} bytes; speed
%{speed_download} bytes/s  --noproxy * --insecure -o
/var/lib/juju/tools/1.25-alpha1-precise-amd64/tools.tar.gz
https://10.112.2.15:17070/tools/1.25-alpha1-precise-amd64
    curl: (7) couldn't connect to host

Joyent network issue, <https://bugs.launchpad.net/juju-core/+bug/1451104>


That all the recent matches for the timeout rule have more useful and
specific matches (some unfortunately needing to look at other log
files for all the details), suggests we want those as rules rather
than this.

Martin



More information about the Juju-dev mailing list