<div dir="ltr"><div><div><div>Hi,<br><br>sorry for the cross post, but I think it's a legitimate case for that. This bug could be coming from a lot of different places.<br><br></div>TL;DR cloud-init added 127.0.1.1 host.domain to /etc/hosts, ceph-mon doesn't listen on 127.0.1.1, local connections to host.domain:6789 fail<br><br></div><div>Bug: <a href="https://bugs.launchpad.net/charms/+source/ceph/+bug/1365671">https://bugs.launchpad.net/charms/+source/ceph/+bug/1365671</a></div><div><br></div>juju, using maas as a provider, deployed ceph and ceph-ods. 4 units in total (3 ceph, 1 ceph-osd).<br><br>ceph is working fine, thanks.<br><br></div><div>On a ceph unit, /etc/hosts has this entry:<br><br>127.0.1.1 clipper.scapestack clipper<br><br></div><div><br>That fqdn, however, resolves correctly via DNS:<br><br>ubuntu@clipper:~$ dig +short clipper.scapestack<br>
10.96.5.247<br><br><br></div><div>If I try to connect to ceph-mon from, which listens on port 6789, using the hostname, or even the fqdn, that fails:<br><br>ubuntu@clipper:~$ telnet clipper.scapestack 6789<br>
Trying 127.0.1.1...<br>
telnet: Unable to connect to remote host: Connection refused<br><br><br>It fails because ceph-mon doesn't listen on <a href="http://127.0.1.1">127.0.1.1</a>:<br><br>ubuntu@clipper:~$ sudo netstat -anp|grep ceph-mon<br>
tcp 0 0 <a href="http://10.96.5.247:6789">10.96.5.247:6789</a> 0.0.0.0:* LISTEN 62208/ceph-mon<br>
tcp 0 0 <a href="http://10.96.5.247:6789">10.96.5.247:6789</a> <a href="http://10.96.5.245:54965">10.96.5.245:54965</a> ESTABLISHED 62208/ceph-mon<br>
tcp 0 0 <a href="http://10.96.5.247:6789">10.96.5.247:6789</a> <a href="http://10.96.5.244:44555">10.96.5.244:44555</a> ESTABLISHED 62208/ceph-mon<br>
tcp 0 0 <a href="http://10.96.5.247:6789">10.96.5.247:6789</a> <a href="http://10.96.5.244:44565">10.96.5.244:44565</a> ESTABLISHED 62208/ceph-mon<br>
unix 2 [ ACC ] STREAM LISTENING 64110 62208/ceph-mon /var/run/ceph/ceph-mon.clipper.asok<br></div><div><br><br></div><div>So we end up in a situation where I can't connect to the service that is running on the same machine using the hostname or the fqdn, because cloud-init was told to munge /etc/hosts. It was told so via "manage_etc_hosts: 'localhost'" (see <a href="https://bugs.launchpad.net/charms/+source/ceph/+bug/1365671/comments/2">https://bugs.launchpad.net/charms/+source/ceph/+bug/1365671/comments/2</a>). I don't know who did that: juju or maas (or something else).<br><br></div><div>In our use case we have a subordinate charm that relates to ceph and tries to gather storage usage information. It uses the fqdn, and is failing. We can probably workaround it, but I thought this should be brought to a wider audience first.<br><br></div><div>I filed a bug against the charm for now, thinking that possibly the easiest solution would be to have ceph-mon also listen on localhost (<a href="https://bugs.launchpad.net/charms/+source/ceph/+bug/1365671">https://bugs.launchpad.net/charms/+source/ceph/+bug/1365671</a>).<br><br></div><div>Some thoughts:<br></div><div>- if you need to connect to ceph-mon, use the real ip (tricky? Try all the interfaces on the machine?)<br></div><div>- have ceph-mon listen on all addresses<br></div><div>- don't add the fqdn to /etc/hosts, just the hostname, when adding the 127.0.1.1 entry. What would this break? Maybe hostname -f on containers?<br><br></div><div><br></div></div>