[Bug 1866119] Re: [bionic] fence_scsi not working properly with 1.1.18-2ubuntu1.1

Rafael David Tinoco rafaeldtinoco at ubuntu.com
Wed Mar 4 20:01:39 UTC 2020


** Description changed:

+ OBS: This bug was originally into LP: #1865523 but it was split.
+ 
+ #### SRU: pacemaker
+ 
+ [Impact]
+ 
+  * fence_scsi is not currently working in a share disk environment
+ 
+  * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
+ be able to start the fencing agents OR, in worst case scenarios, the
+ fence_scsi agent might start but won't make scsi reservations in the
+ shared scsi disk.
+ 
+  * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi,
+ since the later was fixed at LP: #1865523.
+ 
+ [Test Case]
+ 
+  * having a 3-node setup, nodes called "clubionic01, clubionic02,
+ clubionic03", with a shared scsi disk (fully supporting persistent
+ reservations) /dev/sda, with corosync and pacemaker operational and
+ running, one might try:
+ 
+ rafaeldtinoco at clubionic01:~$ crm configure
+ crm(live)configure# property stonith-enabled=on
+ crm(live)configure# property stonith-action=off
+ crm(live)configure# property no-quorum-policy=stop
+ crm(live)configure# property have-watchdog=true
+ crm(live)configure# commit
+ crm(live)configure# end
+ crm(live)# end
+ 
+ rafaeldtinoco at clubionic01:~$ crm configure primitive fence_clubionic \
+     stonith:fence_scsi params \
+     pcmk_host_list="clubionic01 clubionic02 clubionic03" \
+     devices="/dev/sda" \
+     meta provides=unfencing
+ 
+ And see the following errors:
+ 
+ Failed Actions:
+ * fence_clubionic_start_0 on clubionic02 'unknown error' (1): call=6, status=Error, exitreason='',
+     last-rc-change='Wed Mar  4 19:53:12 2020', queued=0ms, exec=1105ms
+ * fence_clubionic_start_0 on clubionic03 'unknown error' (1): call=6, status=Error, exitreason='',
+     last-rc-change='Wed Mar  4 19:53:13 2020', queued=0ms, exec=1109ms
+ * fence_clubionic_start_0 on clubionic01 'unknown error' (1): call=6, status=Error, exitreason='',
+     last-rc-change='Wed Mar  4 19:53:11 2020', queued=0ms, exec=1108ms
+ 
+ and corosync.log will show:
+ 
+ warning: unpack_rsc_op_failure: Processing failed op start for
+ fence_clubionic on clubionic01: unknown error (1)
+ 
+ [Regression Potential]
+ 
+  * LP: #1865523 shows fence_scsi fully operational after SRU for that
+ bug is done.
+ 
+  * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix
+ fence_scsi.
+ 
+  * TODO
+ 
+ [Other Info]
+ 
+  * Original Description:
+ 
  Trying to setup a cluster with an iscsi shared disk, using fence_scsi as
  the fencing mechanism, I realized that fence_scsi is not working in
  Ubuntu Bionic. I first thought it was related to Azure environment (LP:
  #1864419), where I was trying this environment, but then, trying
  locally, I figured out that somehow pacemaker 1.1.18 is not fencing the
  shared scsi disk properly.
  
  Note: I was able to "backport" vanilla 1.1.19 from upstream and
  fence_scsi worked. I have then tried 1.1.18 without all quilt patches
  and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19
  might tell us which commit has fixed the behaviour needed by the
  fence_scsi agent.
  
  (k)rafaeldtinoco at clubionic01:~$ crm conf show
  node 1: clubionic01.private
  node 2: clubionic02.private
  node 3: clubionic03.private
  primitive fence_clubionic stonith:fence_scsi \
-         params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \
-         meta provides=unfencing
+         params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \
+         meta provides=unfencing
  property cib-bootstrap-options: \
-         have-watchdog=false \
-         dc-version=1.1.18-2b07d5c5a9 \
-         cluster-infrastructure=corosync \
-         cluster-name=clubionic \
-         stonith-enabled=on \
-         stonith-action=off \
-         no-quorum-policy=stop \
-         symmetric-cluster=true
+         have-watchdog=false \
+         dc-version=1.1.18-2b07d5c5a9 \
+         cluster-infrastructure=corosync \
+         cluster-name=clubionic \
+         stonith-enabled=on \
+         stonith-action=off \
+         no-quorum-policy=stop \
+         symmetric-cluster=true
  
  ----
  
  (k)rafaeldtinoco at clubionic02:~$ sudo crm_mon -1
  Stack: corosync
  Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with quorum
  Last updated: Mon Mar 2 15:55:30 2020
  Last change: Mon Mar 2 15:45:33 2020 by root via cibadmin on clubionic01.private
  
  3 nodes configured
  1 resource configured
  
  Online: [ clubionic01.private clubionic02.private clubionic03.private ]
  
  Active resources:
  
-  fence_clubionic (stonith:fence_scsi): Started clubionic01.private
+  fence_clubionic (stonith:fence_scsi): Started clubionic01.private
  
  ----
  
  (k)rafaeldtinoco at clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda
-   LIO-ORG cluster.bionic. 4.0
-   Peripheral device type: disk
-   PR generation=0x0, there are NO registered reservation keys
+   LIO-ORG cluster.bionic. 4.0
+   Peripheral device type: disk
+   PR generation=0x0, there are NO registered reservation keys
  
  (k)rafaeldtinoco at clubionic02:~$ sudo sg_persist -r /dev/sda
-   LIO-ORG cluster.bionic. 4.0
-   Peripheral device type: disk
-   PR generation=0x0, there is NO reservation held
+   LIO-ORG cluster.bionic. 4.0
+   Peripheral device type: disk
+   PR generation=0x0, there is NO reservation held

** Changed in: pacemaker (Ubuntu Bionic)
       Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1866119

Title:
  [bionic] fence_scsi not working properly with 1.1.18-2ubuntu1.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1866119/+subscriptions



More information about the Ubuntu-server-bugs mailing list