[Bug 1940976] Re: Race condition in zone serial generation on concurrent changes to recordsets

Tue Oct 12 00:10:46 UTC 2021

Yes, that is what I am asking about.

The configuration setting:

[coordination]
backend_url = <DLM URL>

Such that the threads are using the distributed lock manager.

You can also look for this warning message:
https://github.com/openstack/designate/blob/05343d4226822da8b9776201ea18e000d366573d/designate/coordination.py#L72

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1940976

Title:
  Race condition in zone serial generation on concurrent changes to
  recordsets

Status in Ubuntu Cloud Archive:
  New
Status in Designate:
  New
Status in designate package in Ubuntu:
  New

Bug description:
  I discovered a reproducible race condition when updating multiple
  recordsets of a single zone at the same time. There was an issue
  https://bugs.launchpad.net/bugs/1871332 about multiple designate
  instances and their coordination / distributed locking, but I also
  observe the issue with just a single instance and its multiple worker
  threads targeting the same zone ... and this quite easily happens when
  using IaC tooling like terraform which utilize multiple threads and
  multiple connections when talking to a cloud API.

  To trigger the race condition I used this piece of terraform to create three recordsets:

  --- cut ---

  resource "openstack_dns_recordset_v2" "testrecords" {
    count = 3

    zone_id     = data.openstack_dns_zone_v2.myzone.id
    name        = "record-${count.index}.${data.openstack_dns_zone_v2.myzone.name}"
    description = "test-${count.index}"
    ttl         = 60
    type        = "A"
    records = ["127.0.0.1"]
  }

  --- cut ---

  those 3 records will be created independently / concurrently and in the end the zone one the nameserver does not contain all the records. When creating just one more record afterwards all the records are written / updated in the zonefile properly - so this is due to the serial being updated inconsistently.

  Looking at the code one how the serial is created:
  https://opendev.org/openstack/designate/src/branch/master/designate/utils.py#L137,
  it appears to clearly be subject to race conditions when multiple
  threads are updating the zone currently and use the previously current
  zone timestamp read from the database and increment it "in code".

  There is a not yet merged patchset by Nicolas Bock which does not refer to a bug, but apparently changes the way the serial is created and uses an update statement in the database to increase the serial: https://review.opendev.org/c/openstack/designate/+/776173

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1940976/+subscriptions