[Bug 799711] Re: o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up
HenningMalzahn
799711 at bugs.launchpad.net
Thu Jul 7 09:35:01 UTC 2011
Hi there,
sorry for getting back that late to the issue but I had to work on
somehting else for the past few days.
I did revert both virtual machines again and here's the exact sequence
of commands I've use to attempt to get the Pacemaker integrated dual
master setup to work:
- apt-get install python-software-properties && \
add-apt-repository ppa:ubuntu-ha/lucid-cluster && \
apt-get update
- apt-get install pacemaker libdlm3-pacemaker ocfs2-tools drbd8-utils
openais
- Rebooted
- shred -n 1 -v /dev/mapper/sde1_crypt
- Created the following configuration file for the DRBD device
(/etc/drbd.d/r2.res)
resource r2 {
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysreq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt-f";
}
startup {
degr-wfc-timeout 120;
become-primary-on both;
}
disk {
on-io-error detach;
}
net {
cram-hmac-alg sha1;
shared-secret "SECRET";
data-integrity-alg sha1;
allow-two-primaries;
after-sb-0pri disconnect;
after-sb-1pri consensus;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 60M;
}
on janus {
device /dev/drbd2;
disk /dev/mapper/sde1_crypt;
address 10.10.1.2:7882;
meta-disk internal;
}
on mimas {
device /dev/drbd2;
disk /dev/mapper/sde1_crypt;
address 10.10.1.3:7882;
meta-disk internal;
}
}
- drbdadm create-md r2
md_offset 26836983808
al_offset 26836951040
bm_offset 26836131840
Found some data
==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
Both nodes
- drbdadm create-md r2
- drbdadm attach r2
- drbdadm syncer r2
Second node:
- drbdadm -- --discard-my-data connect r2
First node:
drbdadm -- --overwrite-data-of-peer primary r2
- drbdadm connect r2
- dpkg-reconfigure ocfs2-tools
- update-rc.d o2cb disable
- Created the following cib objects
primitive resDrbd2 ocf:linbit:drbd \
params drbd_resource="r2" \
operations $id="resDrbd2-operations" \
op monitor interval="20s" role="Master" timeout="20s" \
op monitor interval="30s" role="Slave" timeout="20s"
ms msDrbd2 resDrbd2 \
meta resource-stickiness="100" \
master-max="2" master-node-max="1" \
clone-max="2" clone-node-max="1" \
notify="true" globally-unique="false"
location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or
#uname eq node2
location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1
primitive resDlm ocf:pacemaker:controld \
op monitor interval="120s" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
clone cloneDlm resDlm \
meta globally-unique="false" interleave="true"
colocation colDlm-on-msDrb2dMaster inf: cloneDlm msDrbd2:Master
order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm
location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2
primitive resO2CB ocf:pacemaker:o2cb \
op monitor interval="120s" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
clone cloneO2CB resO2CB \
meta globally-unique="false" interleave="true"
colocation colO2CB-on-Dlm inf: cloneO2CB cloneDlm
order ordO2CB-after-Dlm 0: cloneDlm cloneO2CB
location locCloneO2CBAllowedNodes cloneO2CB rule 100: #uname eq node1 or #uname eq node2
- Rebooted both nodes
- After the reboot the Pacemaker services the required Pacemaker service
are up and running (Output of crm_mon -1f)
Master/Slave Set: msDrbd2
Masters: [ node1 node2 ]
Clone Set: cloneDlm
Started: [ node1 node2 ]
Stopped: [ resDlm:2 ]
Clone Set: cloneO2CB
Started: [ node1 node2 ]
Stopped: [ resO2CB:2 ]
- Created the filesystem afterwards using the command: mkfs.ocfs2 -L r2 /dev/drbd2
mkfs.ocfs2 1.4.3
Cluster stack: pcmk
Cluster name: pacemaker
NOTE: Selecting extended slot map for userspace cluster stack
Filesystem label=r2
Block size=4096 (bits=12)
Cluster size=4096 (bits=12)
Volume size=26836131840 (6551790 clusters) (6551790 blocks)
204 cluster groups (tail covers 3822 clusters, rest cover 32256 clusters)
Journal size=167723008
Initial number of node slots: 8
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 3 block(s)
Formatting Journals: done
Formatting slot map: done
Writing lost+found: done
mkfs.ocfs2 successful
- Created the following CIB objects for the filesystem:
primitive resFs2 ocf:heartbeat:Filesystem \
params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" \
op monitor interval="120s" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
meta target-role="stopped"
clone cloneFs2 resFs2 \
meta globally-unique="false" interleave="true"
colocation colFs2-on-CloneO2CB inf: cloneFs2 cloneO2CB
order ordFs2-after-cloneO2CB inf: cloneO2CB cloneFs2
location locCloneFs0AllowedNodes cloneFs2 rule 100: #uname eq node1 or #uname eq node2
- and started the file system by executing the command
crm resource start cloneFs2
The file system comes up fine on the first node (crm_mon -1f)
Clone Set: cloneFs2
Started: [ node1 ]
Stopped: [ resFs2:0 ]
but fails on the second node with the following messages in the system
log:
ocfs2_controld[3483]: Unable to open checkpoint "ocfs2:controld": Object
does not exist
As requested here's the content of /etc/corosync/service.d/
root at node1:[~] # la /etc/corosync/service.d/
total 12
drwxr-xr-x 2 root root 4096 2011-07-05 10:56 .
drwxr-xr-x 4 root root 4096 2011-05-31 16:28 ..
-rw-r--r-- 1 root root 59 2010-02-18 11:09 ckpt-service
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to ocfs2-tools in Ubuntu.
https://bugs.launchpad.net/bugs/799711
Title:
o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/799711/+subscriptions
More information about the Ubuntu-server-bugs
mailing list