AddRemoveSet Failure

John Meinel john at arbash-meinel.com
Sun Mar 16 11:46:07 UTC 2014


So we have an interesting result now that we have more logging in
TestAddRemoveSet:
https://code.launchpad.net/~sinzui/juju-core/1.17.6/+merge/211114

Here is what I think is the relevant part of the log:
[LOG] 93.17779 DEBUG juju.replicaset Set() changing replica set
from {Name: juju, Version: 5, Members: {Member{4 "localhost:36701"
map[key2:val2]}, Member{3 "localhost:42527" map[key1:val1]}, Member{1
"localhost:37355" map[]}, Member{5 "localhost:37868" map[key3:val3]}}}
  to {Name: juju, Version: 6, Members: {Member{4 "localhost:36701"
map[key2:val2]}, Member{3 "localhost:42527" map[key1:val1]}, Member{1
"localhost:37355" map[]}, Member{5 "localhost:37868" map[key3:val3]}}}
Set() 3 attempts in 31.487524572s
attempting session.Ping() got error: no reachable servers after 15.02601713s
attempting session.Ping() got error: no reachable servers after 30.059514198s
session.Ping() 3 attempts in 45.087590444s
replicaset_test.go:291:
    c.Assert(err, gc.IsNil)
... value *errors.errorString = &errors.errorString{s:"no reachable
servers"} ("no reachable servers")

[LOG] 38.27822 DEBUG juju.replicaset Remove() changing replica set
from {Name: juju, Version: 6, Members: {Member{4 "localhost:36701"
map[key2:val2]}, Member{3 "localhost:42527" map[key1:val1]}, Member{1
"localhost:37355" map[]}, Member{5 "localhost:37868" map[key3:val3]}}}
  to {Name: juju, Version: 7, Members: {Member{4 "localhost:36701"
map[key2:val2]}, Member{3 "localhost:42527" map[key1:val1]}, Member{1
"localhost:37355" map[]}}}
[LOG] 38.28443 DEBUG juju.replicaset got EOF while running Remove(),
calling session.Refresh()
[LOG] 38.38664 DEBUG juju.testing killing mongod pid 23355 in
/tmp/test-mgo598109560 on port 37868
[LOG] 38.40487 DEBUG juju.replicaset Remove() changing replica set
from {Name: juju, Version: 7, Members: {Member{3 "localhost:42527"
map[key1:val1]}, Member{1 "localhost:37355" map[]}, Member{1
"localhost:37355" map[]}}}
  to {Name: juju, Version: 8, Members: {Member{3 "localhost:42527"
map[key1:val1]}, Member{1 "localhost:37355" map[]}}}
[LOG] 38.41248 DEBUG juju.replicaset got EOF while running Remove(),
calling session.Refresh()
[LOG] 38.41586 DEBUG juju.testing killing mongod pid 23310 in
/tmp/test-mgo506585917 on port 36701
[LOG] 38.42962 DEBUG juju.replicaset Remove() changing replica set
from {Name: juju, Version: 8, Members: {Member{1 "localhost:37355"
map[]}, Member{1 "localhost:37355" map[]}}}
  to {Name: juju, Version: 9, Members: {Member{1 "localhost:37355" map[]}}}
[LOG] 38.43038 DEBUG juju.testing killing mongod pid 23265 in
/tmp/test-mgo820414422 on port 42527
[LOG] 38.44393 DEBUG juju.testing killing mongod pid 23221 in
/tmp/test-mgo950069651 on port 44432



Followed by:
----------------------------------------------------------------------
PANIC: replicaset_test.go:341: MongoSuite.TestCurrentStatus

[LOG] 53.49015 DEBUG juju.testing tls.Dial(127.0.0.1:37868) failed
with dial tcp 127.0.0.1:37868: connection refused
[LOG] 53.49022 DEBUG juju.testing tls.Dial(127.0.0.1:42527) failed
with dial tcp 127.0.0.1:42527: connection refused
[LOG] 53.49029 DEBUG juju.testing tls.Dial(127.0.0.1:36701) failed
with dial tcp 127.0.0.1:36701: connection refused


One thing that surprises me a lot, is that at the end of AddRemoveSet we see:
  juju.testing killing mongod ... port 37868
However, in TestCurrentStatus it says:
  juju.testing tls.Dial(127.0.0.1:37868) failed with dial tcp
127.0.0.1:37868: connection refused

Why would we be trying to Dial the machine that we just killed?

Are we accidentally reusing ports that we just stopped, which means
the machine has a harder time starting/harder time for us to connect
to it?

John
=:->



More information about the Juju-dev mailing list