[Bug 1865523] Re: [bionic] fence_scsi not working properly with 1.1.18-2ubuntu1.1
Rafael David Tinoco
rafaeldtinoco at ubuntu.com
Wed Mar 4 19:25:44 UTC 2020
** Description changed:
+ #### SRU: fence-agents
+
+ [Impact]
+
+ * fence_scsi is not currently working in a share disk environment
+
+ * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
+ be able to start the fencing agents OR, in worst case scenarios, the
+ fence_scsi agent might start but won't make scsi reservations in the
+ shared scsi disk.
+
+ [Test Case]
+
+ * having a 3-node setup, nodes called "clubionic01, clubionic02,
+ clubionic03", with a shared scsi disk (fully supporting persistent
+ reservations) /dev/sda, one might try the following command:
+
+ sudo fence_scsi --verbose -n clubionic01 -d /dev/sda -k 3abe0000 -o off
+
+ from nodes "clubionic02 or clubionic03" and check if the reservation
+ worked:
+
+ (k)rafaeldtinoco at clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda
+ LIO-ORG cluster.bionic. 4.0
+ Peripheral device type: disk
+ PR generation=0x0, there are NO registered reservation keys
+
+ (k)rafaeldtinoco at clubionic02:~$ sudo sg_persist -r /dev/sda
+ LIO-ORG cluster.bionic. 4.0
+ Peripheral device type: disk
+ PR generation=0x0, there is NO reservation held
+
+ * having a 3-node setup, nodes called "clubionic01, clubionic02,
+ clubionic03", with a shared scsi disk (fully supporting persistent
+ reservations) /dev/sda, with corosync and pacemaker operational and
+ running, one might try:
+
+ rafaeldtinoco at clubionic01:~$ crm configure
+ crm(live)configure# property stonith-enabled=on
+ crm(live)configure# property stonith-action=off
+ crm(live)configure# property no-quorum-policy=stop
+ crm(live)configure# property have-watchdog=true
+ crm(live)configure# property symmetric-cluster=true
+ crm(live)configure# commit
+ crm(live)configure# end
+ crm(live)# end
+
+ rafaeldtinoco at clubionic01:~$ crm configure primitive fence_clubionic \
+ stonith:fence_scsi params \
+ pcmk_host_list="clubionic01 clubionic02 clubionic03" \
+ devices="/dev/sda" \
+ meta provides=unfencing
+
+ And see that crm_mon won't show fence_clubionic resource operational.
+
+ [Regression Potential]
+
+ * Judging by this issue, it is very likely that any Ubuntu user that
+ have tried using fence_scsi has probably migrated to a newer version
+ because fence_scsi agent is broken since its release.
+
+ * The way I fixed fence_scsi was this:
+
+ I packaged pacemaker in latest 1.1.X version and kept it "vanilla" so I
+ could bisect fence-agents. At that moment I realized that bisecting was
+ going to be hard because there were multiple issues, not only one. I
+ backported the latest fence-agents together with Pacemaker 1.1.19-0 and
+ saw that it worked.
+
+ From then on, I bisected the following intervals:
+
+ 4.3.0 .. 4.4.0 (eoan - working)
+ 4.2.0 .. 4.3.0
+ 4.1.0 .. 4.2.0
+ 4.0.25 .. 4.1.0 (bionic - not working)
+
+ In each of those intervals I discovered issues. For example, Using 4.3.0
+ I faced problems so I had to backport fixes that were in between 4.4.0
+ and 4.3.0. Then, backporting 4.2.0, I faced issues so I had to backport
+ fixes from the 4.3.0 <-> 4.2.0 interval. I did this until I was at
+ 4.0.25 version, current Bionic fence-agents version.
+
+ [Other Info]
+
+ * Original Description:
+
Trying to setup a cluster with an iscsi shared disk, using fence_scsi as
the fencing mechanism, I realized that fence_scsi is not working in
Ubuntu Bionic. I first thought it was related to Azure environment (LP:
#1864419), where I was trying this environment, but then, trying
locally, I figured out that somehow pacemaker 1.1.18 is not fencing the
shared scsi disk properly.
Note: I was able to "backport" vanilla 1.1.19 from upstream and
fence_scsi worked. I have then tried 1.1.18 without all quilt patches
and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19
might tell us which commit has fixed the behaviour needed by the
fence_scsi agent.
(k)rafaeldtinoco at clubionic01:~$ crm conf show
node 1: clubionic01.private
node 2: clubionic02.private
node 3: clubionic03.private
primitive fence_clubionic stonith:fence_scsi \
- params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \
- meta provides=unfencing
+ params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \
+ meta provides=unfencing
property cib-bootstrap-options: \
- have-watchdog=false \
- dc-version=1.1.18-2b07d5c5a9 \
- cluster-infrastructure=corosync \
- cluster-name=clubionic \
- stonith-enabled=on \
- stonith-action=off \
- no-quorum-policy=stop \
- symmetric-cluster=true
+ have-watchdog=false \
+ dc-version=1.1.18-2b07d5c5a9 \
+ cluster-infrastructure=corosync \
+ cluster-name=clubionic \
+ stonith-enabled=on \
+ stonith-action=off \
+ no-quorum-policy=stop \
+ symmetric-cluster=true
----
(k)rafaeldtinoco at clubionic02:~$ sudo crm_mon -1
Stack: corosync
Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Mar 2 15:55:30 2020
Last change: Mon Mar 2 15:45:33 2020 by root via cibadmin on clubionic01.private
3 nodes configured
1 resource configured
Online: [ clubionic01.private clubionic02.private clubionic03.private ]
Active resources:
- fence_clubionic (stonith:fence_scsi): Started
+ fence_clubionic (stonith:fence_scsi): Started
clubionic01.private
----
(k)rafaeldtinoco at clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda
- LIO-ORG cluster.bionic. 4.0
- Peripheral device type: disk
- PR generation=0x0, there are NO registered reservation keys
+ LIO-ORG cluster.bionic. 4.0
+ Peripheral device type: disk
+ PR generation=0x0, there are NO registered reservation keys
(k)rafaeldtinoco at clubionic02:~$ sudo sg_persist -r /dev/sda
- LIO-ORG cluster.bionic. 4.0
- Peripheral device type: disk
- PR generation=0x0, there is NO reservation held
+ LIO-ORG cluster.bionic. 4.0
+ Peripheral device type: disk
+ PR generation=0x0, there is NO reservation held
** Description changed:
#### SRU: fence-agents
[Impact]
- * fence_scsi is not currently working in a share disk environment
+ * fence_scsi is not currently working in a share disk environment
- * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
+ * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
be able to start the fencing agents OR, in worst case scenarios, the
fence_scsi agent might start but won't make scsi reservations in the
shared scsi disk.
[Test Case]
- * having a 3-node setup, nodes called "clubionic01, clubionic02,
+ * having a 3-node setup, nodes called "clubionic01, clubionic02,
clubionic03", with a shared scsi disk (fully supporting persistent
reservations) /dev/sda, one might try the following command:
sudo fence_scsi --verbose -n clubionic01 -d /dev/sda -k 3abe0000 -o off
from nodes "clubionic02 or clubionic03" and check if the reservation
worked:
(k)rafaeldtinoco at clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda
LIO-ORG cluster.bionic. 4.0
Peripheral device type: disk
PR generation=0x0, there are NO registered reservation keys
(k)rafaeldtinoco at clubionic02:~$ sudo sg_persist -r /dev/sda
LIO-ORG cluster.bionic. 4.0
Peripheral device type: disk
PR generation=0x0, there is NO reservation held
- * having a 3-node setup, nodes called "clubionic01, clubionic02,
+ * having a 3-node setup, nodes called "clubionic01, clubionic02,
clubionic03", with a shared scsi disk (fully supporting persistent
reservations) /dev/sda, with corosync and pacemaker operational and
running, one might try:
rafaeldtinoco at clubionic01:~$ crm configure
crm(live)configure# property stonith-enabled=on
crm(live)configure# property stonith-action=off
crm(live)configure# property no-quorum-policy=stop
crm(live)configure# property have-watchdog=true
crm(live)configure# property symmetric-cluster=true
crm(live)configure# commit
crm(live)configure# end
crm(live)# end
rafaeldtinoco at clubionic01:~$ crm configure primitive fence_clubionic \
- stonith:fence_scsi params \
- pcmk_host_list="clubionic01 clubionic02 clubionic03" \
- devices="/dev/sda" \
- meta provides=unfencing
+ stonith:fence_scsi params \
+ pcmk_host_list="clubionic01 clubionic02 clubionic03" \
+ devices="/dev/sda" \
+ meta provides=unfencing
And see that crm_mon won't show fence_clubionic resource operational.
[Regression Potential]
- * Judging by this issue, it is very likely that any Ubuntu user that
+ * Comments #3 and #4 show this new version fully working.
+
+ * This fix has a potential of breaking other "nowadays working" fencing agent. If that happens, I suggest that ones affected revert previous to previous package AND open a bug against either pacemaker and/or fence-agents.
+
+ * Judging by this issue, it is very likely that any Ubuntu user that
have tried using fence_scsi has probably migrated to a newer version
because fence_scsi agent is broken since its release.
- * The way I fixed fence_scsi was this:
+ * The way I fixed fence_scsi was this:
I packaged pacemaker in latest 1.1.X version and kept it "vanilla" so I
could bisect fence-agents. At that moment I realized that bisecting was
going to be hard because there were multiple issues, not only one. I
backported the latest fence-agents together with Pacemaker 1.1.19-0 and
saw that it worked.
From then on, I bisected the following intervals:
4.3.0 .. 4.4.0 (eoan - working)
4.2.0 .. 4.3.0
4.1.0 .. 4.2.0
4.0.25 .. 4.1.0 (bionic - not working)
In each of those intervals I discovered issues. For example, Using 4.3.0
I faced problems so I had to backport fixes that were in between 4.4.0
and 4.3.0. Then, backporting 4.2.0, I faced issues so I had to backport
fixes from the 4.3.0 <-> 4.2.0 interval. I did this until I was at
4.0.25 version, current Bionic fence-agents version.
[Other Info]
-
- * Original Description:
+
+ * Original Description:
Trying to setup a cluster with an iscsi shared disk, using fence_scsi as
the fencing mechanism, I realized that fence_scsi is not working in
Ubuntu Bionic. I first thought it was related to Azure environment (LP:
#1864419), where I was trying this environment, but then, trying
locally, I figured out that somehow pacemaker 1.1.18 is not fencing the
shared scsi disk properly.
Note: I was able to "backport" vanilla 1.1.19 from upstream and
fence_scsi worked. I have then tried 1.1.18 without all quilt patches
and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19
might tell us which commit has fixed the behaviour needed by the
fence_scsi agent.
(k)rafaeldtinoco at clubionic01:~$ crm conf show
node 1: clubionic01.private
node 2: clubionic02.private
node 3: clubionic03.private
primitive fence_clubionic stonith:fence_scsi \
params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \
meta provides=unfencing
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.18-2b07d5c5a9 \
cluster-infrastructure=corosync \
cluster-name=clubionic \
stonith-enabled=on \
stonith-action=off \
no-quorum-policy=stop \
symmetric-cluster=true
----
(k)rafaeldtinoco at clubionic02:~$ sudo crm_mon -1
Stack: corosync
Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Mar 2 15:55:30 2020
Last change: Mon Mar 2 15:45:33 2020 by root via cibadmin on clubionic01.private
3 nodes configured
1 resource configured
Online: [ clubionic01.private clubionic02.private clubionic03.private ]
Active resources:
fence_clubionic (stonith:fence_scsi): Started
clubionic01.private
----
(k)rafaeldtinoco at clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda
LIO-ORG cluster.bionic. 4.0
Peripheral device type: disk
PR generation=0x0, there are NO registered reservation keys
(k)rafaeldtinoco at clubionic02:~$ sudo sg_persist -r /dev/sda
LIO-ORG cluster.bionic. 4.0
Peripheral device type: disk
PR generation=0x0, there is NO reservation held
** No longer affects: fence-agents (Ubuntu Focal)
** Changed in: fence-agents (Ubuntu Bionic)
Status: Confirmed => In Progress
--
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1865523
Title:
[bionic] fence_scsi not working properly with 1.1.18-2ubuntu1.1
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1865523/+subscriptions
More information about the Ubuntu-server-bugs
mailing list