[Bug 799711] Re: o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

HenningMalzahn 799711 at bugs.launchpad.net
Thu Jul 7 09:35:01 UTC 2011


Hi there,

sorry for getting back that late to the issue but I had to work on
somehting else for the past few days.

I did revert both virtual machines again and here's the exact sequence
of commands I've use to attempt to get the Pacemaker integrated dual
master setup to work:

- apt-get install python-software-properties && \
  add-apt-repository ppa:ubuntu-ha/lucid-cluster && \
  apt-get update

- apt-get install pacemaker libdlm3-pacemaker ocfs2-tools drbd8-utils
openais

- Rebooted

- shred -n 1 -v /dev/mapper/sde1_crypt

- Created the following configuration file for the DRBD device
(/etc/drbd.d/r2.res)

resource r2 {
  
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysreq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt-f";
  }

  startup {
    degr-wfc-timeout 120;
    become-primary-on both;
  }

  disk {
    on-io-error detach;
  }

  net {
    cram-hmac-alg sha1;
    shared-secret "SECRET";

    data-integrity-alg sha1;
    allow-two-primaries;

    after-sb-0pri disconnect;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    rate 60M;
  }

  on janus {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.2:7882;
    meta-disk internal;
  }

  on mimas {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.3:7882;
    meta-disk internal;
  }
}

- drbdadm create-md r2

md_offset 26836983808
al_offset 26836951040
bm_offset 26836131840

Found some data

 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

Both nodes
- drbdadm create-md r2

- drbdadm attach r2

- drbdadm syncer r2

Second node:
- drbdadm -- --discard-my-data connect r2

First node:
drbdadm -- --overwrite-data-of-peer primary r2

- drbdadm connect r2

- dpkg-reconfigure ocfs2-tools

- update-rc.d o2cb disable

- Created the following cib objects

primitive resDrbd2 ocf:linbit:drbd \
    params drbd_resource="r2" \
    operations $id="resDrbd2-operations" \
    op monitor interval="20s" role="Master" timeout="20s" \
    op monitor interval="30s" role="Slave" timeout="20s"


ms msDrbd2 resDrbd2 \
          meta resource-stickiness="100" \
          master-max="2" master-node-max="1" \
          clone-max="2" clone-node-max="1" \
          notify="true" globally-unique="false"

location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or
#uname eq node2

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1


primitive resDlm ocf:pacemaker:controld \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100"


clone cloneDlm resDlm \
      meta globally-unique="false" interleave="true"


colocation colDlm-on-msDrb2dMaster inf: cloneDlm msDrbd2:Master


order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm


location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2


primitive resO2CB ocf:pacemaker:o2cb \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100"


clone cloneO2CB resO2CB \
      meta globally-unique="false" interleave="true"


colocation colO2CB-on-Dlm inf: cloneO2CB cloneDlm


order ordO2CB-after-Dlm 0: cloneDlm cloneO2CB


location locCloneO2CBAllowedNodes cloneO2CB rule 100: #uname eq node1 or #uname eq node2

- Rebooted both nodes

- After the reboot the Pacemaker services the required Pacemaker service
are up and running (Output of crm_mon -1f)

 Master/Slave Set: msDrbd2
     Masters: [ node1 node2 ]
 Clone Set: cloneDlm
     Started: [ node1 node2 ]
     Stopped: [ resDlm:2 ]
 Clone Set: cloneO2CB
     Started: [ node1 node2 ]
     Stopped: [ resO2CB:2 ]

- Created the filesystem afterwards using the command: mkfs.ocfs2 -L r2 /dev/drbd2
mkfs.ocfs2 1.4.3
Cluster stack: pcmk
Cluster name: pacemaker
NOTE: Selecting extended slot map for userspace cluster stack
Filesystem label=r2
Block size=4096 (bits=12)
Cluster size=4096 (bits=12)
Volume size=26836131840 (6551790 clusters) (6551790 blocks)
204 cluster groups (tail covers 3822 clusters, rest cover 32256 clusters)
Journal size=167723008
Initial number of node slots: 8
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 3 block(s)
Formatting Journals: done
Formatting slot map: done
Writing lost+found: done
mkfs.ocfs2 successful

- Created the following CIB objects for the filesystem:

primitive resFs2 ocf:heartbeat:Filesystem \
          params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100" \
          meta target-role="stopped"


clone cloneFs2 resFs2 \
      meta globally-unique="false" interleave="true"


colocation colFs2-on-CloneO2CB inf: cloneFs2 cloneO2CB


order ordFs2-after-cloneO2CB inf: cloneO2CB cloneFs2


location locCloneFs0AllowedNodes cloneFs2 rule 100: #uname eq node1 or #uname eq node2

- and started the file system by executing the command

crm resource start cloneFs2

The file system comes up fine on the first node (crm_mon -1f)

 Clone Set: cloneFs2
     Started: [ node1 ]
     Stopped: [ resFs2:0 ]

but fails on the second node with the following messages in the system
log:

ocfs2_controld[3483]: Unable to open checkpoint "ocfs2:controld": Object
does not exist

As requested here's the content of /etc/corosync/service.d/
root at node1:[~] # la /etc/corosync/service.d/
total 12
drwxr-xr-x 2 root root 4096 2011-07-05 10:56 .
drwxr-xr-x 4 root root 4096 2011-05-31 16:28 ..
-rw-r--r-- 1 root root   59 2010-02-18 11:09 ckpt-service

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to ocfs2-tools in Ubuntu.
https://bugs.launchpad.net/bugs/799711

Title:
  o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/799711/+subscriptions



More information about the Ubuntu-server-bugs mailing list