RFC: Ubuntu HA resource-agents supportability

Christian Ehrhardt christian.ehrhardt at canonical.com
Fri Apr 3 08:21:53 UTC 2020


On Tue, Mar 31, 2020 at 7:09 AM Rafael David Tinoco
<rafaeldtinoco at ubuntu.com> wrote:
>
> Hello,
>
> As many as you know I'm currently revamping Ubuntu High Availability
> Packages
>
> For 20.04, considered HA (or HA related) packages are:
>
> - Core packages:
>
>   - libqb
>   - kronosnet
>   - corosync
>   - pacemaker
>   - resource-agents
>   - fence-agents
>   - crmsh
>   - cluster-glue
>   - drbd-utils
>   - dlm
>   - gfs2-utils
>
> - "Deprecated" packages:
>
>   - heartbeat
>   - keepalived
>   - ocfs2-tools
>
> - Not in "main" packages:
>
>   - pcs (will likely replace crmsh in near future)
>   - csync2
>   - corosync-qdevice
>   - fence-virt
>   - sbd
>   - booth
>
> - Related packages:
>
>   - multipath-tools
>   - open-iscsi
>   - sg3-utils
>   - targetcli-fb
>   - tgt (we're trying to deprecate in favor of LIO)
>   - lvm2
>
> For now, until Beta Freeze, we've been trying to catch up with upstream
> latest
> releases and, from now on, I'm going through existing opened bugs and
> addressing
> them with latest fixes (from upstream) + any needed fix to address the bugs
> (done to kronosnet, with FFE opened, and corosync, about to merge fixes to
> it).
>
> Next step is to document in Server Guide all supported scenarios for HA
> related
> packages. The intention here is to describe exact set of scenarios that we
> know
> are good for the perfect behavior of clustering software AND which scenarios
> we
> cannot support.
>
> OBS: This includes the need, or not, to have odd number of nodes/votes, to
> have
> or not proper fencing mechanisms (and which fencing mechanisms to support)
> AND,
> finally, what *resource agents* to support.
>
> I'll probably ask other feedback soon, but, for this moment, I'm asking
> comments
> for the list of resource agents bellow. I tried to split and explain what
> the
> resources are used for and if they are supported in Ubuntu or not (or if the
> related managed service is in [main] or in [universe]).

Hi,
I added a few comments below but I must admit that I didn't completely
get what you want.
The list includes so many things, what are you expecting:
- if they are important overall?
- if they are important for HA cases?
- if they should be promoted/demoted in their support level?
- are you asking that for the packages/features themself or only for
their HA support?
  (e.g. does LVM being listed mean "is LVM important" or "do we need
an HA resource agent for LVM")

To be clear I love and appreciate all the things you already fixed up
in the HA space for Ubuntu 20.04.
And I'd want to help going forward, but myself and probably others
might be lost here.

Could you break these community questions into smaller chunks
and clearly outline the questions that you ask the people to answer?

Maybe this is worth an entire discourse sub-category where multiple
entries would help to split things?
If you'd like that you could prep it there and test-run with a few
people to comment there.
Edit/Improve the topic/questions based on that and then send a mail
again to the list here with a TL;DR and a list of link-per-topic.

What do you think?

> So please, take some time to provide feedback about this list, whether we
> should
> move resources from one category to the other. *NOTE* that I'm not giving
> the
> "fence agents" list yet. That will be another list.
>
> I'm particularly interested in feedback from @jamespage and @ddstreet as
> they
> probably have good intel about resources usage BUT anyone is welcome to
> provide
> comments!
>
> Thank you very much in advance!
>
> #### RFC: Ubuntu HA resource-agents supportability
>
> #
> ## FULLY SUPPORTED (managed service is likely in main or is important
> enough)
> #
>
>     # trivial agents
>
> Delay                   - test resource for introducing delay
> MailTo                  - sends email to a sysadmin whenever a takeover
> occurs
> ClusterMon              - runs crm_mon to a html page from time to time
> HealthCPU               - measures CPU idling and updates #health-cpu attr
> HealthIOWait            - measures CPU idling and updates #health-iowait
> attr
> HealthSMART             - measures CPU idling and updates #health-smart attr
>
>     # services
>
> apache                  - apache web server instance
> dovecot                 - dovecot IMAP/POP3 server instance
> dhcpd                   - chrooted ISC dhcp server instance
> mysql                   - MySQL database instance
> mysql-proxy             - MySQL proxy instance
> named                   - bind/named server instance
> nfsnotify               - nfs sm-notify reboot notifications daemon
> nfsserver               - nfs server resource
> nginx                   - Nginx web/proxy server instance
> postfix                 - postfix mail server instance
> rabbitmq-cluster        - cloned rabbitmq cluster instance
> remote                  - pacemaker remote resource agent
> rsyncd                  - rsyncd instance
> rsyslog                 - rsyslogd instance
> slapd                   - stand-alone LDAP daemon instance
> Squid                   - squid proxy server instance
> vsftpd                  - vsftpd server instance
>
>     # storage
>
> Raid1                   - software RAID (MD) devices on shared storage
> iscsi                   - local iscsi initiator and its conns to targets
> iSCSILogicalUnit        - iSCSI logical units
> iSCSITarget             - iSCSI target export agent (implementation: tgt,
> lio)
> LVM                     - LVM volume as an HA resource
> LVM-activate            - LVM activation/deact work for a given VG
>                           (lvmlockd+LVM-activate OR clvm+LVM-activate)
> Filesystem              - filesystem on a shared storage medium
> symlink                 - symbolic link
> ZFS                     - ZFS pools import/export
>
>     # locking & reservations
>
> controld                - distributed lock manager for clustered FSs
> clvm                    - clvmd daemon (cluster logical vol manager)
> lvmlockd                - agent manages the lvmlockd daemon.
> mpathpersist            - SCSI persistent reservations on mpath devs
> sg_persist              - master/slave resource for SCSI3 reservations
>
>     # networking
>
> Route                   - network routes
> iface-bridge            - bridge network interfaces
> iface-vlan              - vlan network interfaces
> IPaddr2                 - virtual IPv4 and IPv6 addresses
> ipsec                   - ipsec tunnels for VIPs
> IPsrcaddr               - preferred source address modification
> IPv6addr                - IPv6 aliases
> conntrackd              - conntrackd instance
> SendArp                 - send gratuitous ARP for IP address
> VIPArip                 - virtual IP address through RIP2
> ifspeed                 - monitor action runs -> updates CIB with if speed
>
>     # virtualization
>
> VirtualDomain           - manages virtual domains through libvirt
>                           (virtual machine only)
>
>     # containers
>
> docker                  - docker container resource agent

Umm, this one is a bit more complex.
There is runc/containerd which provide the same but are supported.
You should punt docket down to community support and replace this
entry here with runc/containerd.

> lxc                     - allows LXC containers to be managed by the cluster
>
> #
> ## BEST EFFORT SUPPORT (managed service is likely in universe or is
> interesting)
> #
>
>   # trivial agents
>
> anything                - generic agent to manage virtually *anything*
> Dummy                   - testing dummy resource agent (template for RA
> writers)
> AudibleAlarm            - audible beeps at interval
> Stateful                - example agent that supports two states
> WinPopup                - sends a SMB notification msg (popup) to a host
>
>   # services
>
> asterisk                - asterisk PBX
> CTDB                    - clustered samba (for needed clustered underlying)
> dnsupdate               - ip take-over via dynamic dns updates
> exportfs                - nfs exports (not the nfs server)
> fio                     - fio instance
> galera                  - galera instance
> garbd                   - galera arbitrator instance
> jboss                   - JBoss application server instance
> jira                    - JIRA server instance
> kamailio                - kamailio SIP proxy/registrar instance
> mariadb                 - MariaDB master/slave instance
> nagios                  - nagios instance
> ovsmonitor              - clone resource to monitor network bonds on diff
> nodes
> pgagent                 - pgagent instance
> pgsql                   - pgsql database instance
> pound                   - pound reverse proxy load-bal server instance
> proftpd                 - proftpd instance
> Pure-FTPd               - pure-ftpd instance
> redis                   - redis server (supports master/slave replicas)
> instance
> syslog-ng               - syslog-ng instance
> tomcat                  - tomcat servlet environment instance
> varnish                 - varnish instance
>
>     # storage
>
> AoEtarget               - ata over ethernet
>
>     # networking
>
> IPaddr                  - virtual IPv4 addresses
> ocf:pacemaker:ping      - records in CIB number of nodes host can connect to
> portblock               - temporarily block/unblock access to tcp/udp ports
>
>     # openstack
>
> openstack-cinder-volume - attach cinder vol to an instance (os-info <->)
> openstack-floating-ip   - move a floating IP from an instance to another
>
>     # registration (CIB)
>
> lxd-info                - nr of lxd containers running in CIB
> machine-info            - records various node attributes in CIB
> NodeUtilization         - cpu, host mem, hypervisor mem etc... into CIB
> openstack-info          - records attributes of a node into CIB
> SysInfo                 - records various node attributes into CIB
> SystemHealth            - monitors health of system using IPMI
> attribute               - sets node attr one way when started and vice-versa
>
> #
> ## COMMUNITY SUPPORT (bugs opened here will be forwarded to upstream
> directly)
> #
>
>     # services
>
> SphinxSearchDaemon      - sphix search daemon
> Xinetd                  - start/stop services managed by xinetd
> zabbixserver            - zabbix server instance
>
>     # storage
>
> o2cb                    - oracle cluster filesystem userspace daemon
> (oracle)
> sfex                    - excl access to shared storage using SF-EX
>
>     # virtualization
>
> aliyun-vpc-move-ip      - move ip within a vpc of the aliyum ecs (alibaba)
> awseip                  - manages aws elastic IP address (aws)
> awsvip                  - manages aws secondary private ip addresses (aws)
> aws-vpc-move-ip         - move ip within a vpc of the aws ec2 (aws)
> aws-vpc-route53         - update route53 vpc record for aws ec2 (aws)
> azure-events            - monitor for scheduled events for azure vm (azure)
> azure-lb                - answers azure load balancer health probe req
> (azure)
> gcp-vpc-move-ip         - floating ip address within a GCP VPC (google)
> ManageVE                - openVZ virtual environment (virtuozzo)
> minio                   - minio server instance
> podman                  - creates/launches podman containers
> rkt                     - creates/launches container based on supplied image
>
> #
> ## UNSUPPORTED (Ubuntu does not support it)
> #
>
> db2                     - manages IBM DB2 LUW databases (IBM)
> eDir88                  - Novell eDirectory directory server instance
> (novell)
> ICP                     - ICP vortex clustered host drive (intel)
> ids                     - IBM informix dynamic server (IDS) (IBM)
> SAPDatabase             - SAP database (of any type) instance agent (SAP)
> SAPInstance             - SAP application server instances agent (SAP)
> ServeRAID               - enables/disables shared serveRAID merge groups
> (IBM)
> ManageRAID              - raid devices (/etc/conf.d/HB-ManageRAID)
> oraasm                  - oracle asm agent, uses ohasd for asm disk grp
> (oracle)
> oracle                  - oracle database instance (oracle)
> oralsnr                 - oracle TNS listener (oracle)
> sybaseASE               - sybase ASE failover instance (Sybase)
> vdo-vol                 - https://bugs.launchpad.net/ubuntu/+bug/1869825
> WAS                     - websphere application server instance (IBM)
> WAS6                    - websphere application server instance (IBM)
> Xen                     - xen unprivileged domains

Xen is on community support level, so you likely want to move this one
category up.

> #
> ## DEPRECATED (do not use)
> #
>
> Evmsd                   - clustered evms vol mgmt (evms is not maintained)
> EvmsSCC                 - clustered evms vol mgmt (evms is not maintained)
> LinuxSCSI               - enables/disables scsi devs through kernel scsi
> hotplug
> scsi2reservation        - SCSI-2 reservation agent (depends on
> "scsi_reserve")
> ocf:heartbeat:pingd     - monitors connectivity to specific hosts
> ocf:pacemaker:pingd     - replaced by pacemaker:ping (this is broken)
> vmware                  - control vmware server 2.0 virtual machines (2009)
>
>


-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd



More information about the ubuntu-devel mailing list