[Bug 2012047] [NEW] Client should not blindly resend/retry POST/PUT (create/update) requests

Marcin Wilk 2012047 at bugs.launchpad.net
Fri Mar 17 08:53:59 UTC 2023


Public bug reported:

Env: Focal/Ussuri
Client: python3-openstackclient 5.2.0-0ubuntu1.20.04.1

When user submits create request (ie, create subnet) to openstack api, and the request is processed synchronously (unlike server create for example), the client waits for the API response. If that request takes extensive time to complete, longer than any client idle time value on the API backend, the server closes the TCP connection with the openstackclient. In such cases the client automatically repeats the POST request, which by default is a wrong behaviour.
Following are logs from the 'openstack sbunet create example' where execution of the initial request took over 100s (keystone auth logs omitted for clarity):
# the command:
os subnet create \
  --network cd275f37-deec-4f0f-9ad3-4816ca52678e \
  --gateway 172.16.0.1 \
  --allocation-pool start=172.16.0.100,end=172.16.0.200 \
  --subnet-range 172.16.0.0/24 \
  --dns-nameserver 8.8.8.8 \
  --no-dhcp external-subnet-2 \
  --debug

# the command output:
Network client initialized using OpenStack SDK: <openstack.network.v2._proxy.Proxy object at 0x7f7ec8f34490>
REQ: curl -g -i -X GET http://10.5.100.3:9696/v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e -H "User-Agent: openstacksdk/0.46.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" -H "X-Auth-Token: {SHA256}6d3a82e66bdeb642bcfef551c78dbc0063e1bad49431683e6a0023f28dbc1d7d"
Starting new HTTP connection (1): 10.5.100.3:9696
http://10.5.100.3:9696 "GET /v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e HTTP/1.1" 200 726
RESP: [200] Connection: keep-alive Content-Length: 726 Content-Type: application/json Date: Thu, 16 Mar 2023 07:23:59 GMT X-Openstack-Request-Id: req-3abe5525-cde2-436d-bf39-34746d76f07a
RESP BODY: {"network":{"id":"cd275f37-deec-4f0f-9ad3-4816ca52678e","name":"ext_net","tenant_id":"d6761dacbb0649189a07a4a1a191a8c0","admin_state_up":true,"mtu":1500,"status":"ACTIVE","subnets":["040a6d57-b6fc-436a-b531-03baf4abb609"],"shared":false,"availability_zone_hints":[],"availability_zones":["nova"],"ipv4_address_scope":null,"ipv6_address_scope":null,"router:external":true,"description":"","port_security_enabled":true,"dns_domain":"","l2_adjacency":true,"is_default":false,"tags":[],"created_at":"2022-05-06T10:18:25Z","updated_at":"2023-03-16T06:35:33Z","revision_number":16,"project_id":"d6761dacbb0649189a07a4a1a191a8c0","provider:network_type":"flat","provider:physical_network":"physnet1","provider:segmentation_id":null}}
GET call to network for http://10.5.100.3:9696/v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e used request id req-3abe5525-cde2-436d-bf39-34746d76f07a
REQ: curl -g -i -X POST http://10.5.100.3:9696/v2.0/subnets -H "Content-Type: application/json" -H "User-Agent: openstacksdk/0.46.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" -H "X-Auth-Token: {SHA256}6d3a82e66bdeb642bcfef551c78dbc0063e1bad49431683e6a0023f28dbc1d7d" -d '{"subnet": {"allocation_pools": [{"start": "172.16.0.100", "end": "172.16.0.200"}], "ip_version": 4, "gateway_ip": "172.16.0.1", "name": "external-subnet-2", "dns_nameservers": ["8.8.8.8"], "enable_dhcp": false, "cidr": "172.16.0.0/24", "network_id": "cd275f37-deec-4f0f-9ad3-4816ca52678e"}}'
Failure: Unable to establish connection to http://10.5.100.3:9696/v2.0/subnets: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')). Retrying in 0.5s.
Starting new HTTP connection (2): 10.5.100.3:9696
http://10.5.100.3:9696 "POST /v2.0/subnets HTTP/1.1" 400 221
RESP: [400] Connection: keep-alive Content-Length: 221 Content-Type: application/json Date: Thu, 16 Mar 2023 07:25:30 GMT X-Openstack-Request-Id: req-d0632669-e1ce-4157-b5a5-e52be6e51873
RESP BODY: {"NeutronError": {"type": "InvalidInput", "message": "Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.", "detail": ""}}
POST call to network for http://10.5.100.3:9696/v2.0/subnets used request id req-d0632669-e1ce-4157-b5a5-e52be6e51873
BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cliff/app.py", line 400, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3/dist-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3/dist-packages/cliff/display.py", line 117, in run
    column_names, data = self.take_action(parsed_args)
  File "/usr/lib/python3/dist-packages/openstackclient/network/v2/subnet.py", line 377, in take_action
    obj = client.create_subnet(**attrs)
  File "/usr/lib/python3/dist-packages/openstack/network/v2/_proxy.py", line 3491, in create_subnet
    return self._create(_subnet.Subnet, **attrs)
  File "/usr/lib/python3/dist-packages/openstack/proxy.py", line 417, in _create
    return res.create(self, base_path=base_path)
  File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1291, in create
    self._translate_response(response, has_body=has_body)
  File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1113, in _translate_response
    exceptions.raise_from_response(response, error_message=error_message)
  File "/usr/lib/python3/dist-packages/openstack/exceptions.py", line 234, in raise_from_response
    raise cls(
openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
clean_up CreateSubnet: BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
END return value: 1

>From the neutron-server.log the first request from client came in:
2023-03-16 07:23:59.608 2384978 DEBUG neutron.api.v2.base [req-66884669-2253-4faf-95a4-ebbe353896a0 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] Request body: {'subnet': {'allocation_pools': [{'start': '172.16.0.100', 'end': '172.16.0.200'}], 'ip_version': 4, 'gateway_ip': '172.16.0.1', 'name': 'external-subnet-2', 'dns_nameservers': ['8.8.8.8'], 'enable_dhcp': False, 'cidr': '172.16.0.0/24', 'network_id': 'cd275f37-deec-4f0f-9ad3-4816ca52678e'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:719

and ended successfully after 131s:
2023-03-16 07:26:10.768 2384978 INFO neutron.wsgi [req-66884669-2253-4faf-95a4-ebbe353896a0 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] 10.5.2.174 "POST /v2.0/subnets HTTP/1.1" status: 201  len: 0 time: 131.1619370

But in the meaintime, the second request came as a result of the client retry action:
2023-03-16 07:25:30.129 2384976 DEBUG neutron.api.v2.base [req-d0632669-e1ce-4157-b5a5-e52be6e51873 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] Request body: {'subnet': {'allocation_pools': [{'start': '172.16.0.100', 'end': '172.16.0.200'}], 'ip_version': 4, 'gateway_ip': '172.16.0.1', 'name': 'external-subnet-2', 'dns_nameservers': ['8.8.8.8'], 'enable_dhcp': False, 'cidr': '172.16.0.0/24', 'network_id': 'cd275f37-deec-4f0f-9ad3-4816ca52678e'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:719

And of course it failed validation:
2023-03-16 07:25:30.409 2384976 INFO neutron.wsgi [req-d0632669-e1ce-4157-b5a5-e52be6e51873 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] 10.5.2.174 "POST /v2.0/subnets HTTP/1.1" status: 400  len: 429 time: 0.2894270

The first request comlete sucessfully but the user wouldn't know about it. Instad he got measleading error as above.
This problem comes from the keystoneauth1 library, used by the python-openstackclient:
https://github.com/openstack/keystoneauth/blob/stable/ussuri/keystoneauth1/session.py#L1019
https://github.com/openstack/keystoneauth/blob/stable/ussuri/keystoneauth1/session.py#L1031

In order to reproduce the problem I set python debugger on the neutron server code, here:
https://opendev.org/openstack/neutron/src/branch/master/neutron/api/rpc/handlers/resources_rpc.py#L237

But this problem originally comes from the production environment with
60+ compute nodes (DVR) and notification about changes due to the subnet
creations takes over 100s.

Kind regards,
Marcin

** Affects: python-openstackclient (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to python-openstackclient in Ubuntu.
https://bugs.launchpad.net/bugs/2012047

Title:
  Client should not blindly resend/retry POST/PUT (create/update)
  requests

Status in python-openstackclient package in Ubuntu:
  New

Bug description:
  Env: Focal/Ussuri
  Client: python3-openstackclient 5.2.0-0ubuntu1.20.04.1

  When user submits create request (ie, create subnet) to openstack api, and the request is processed synchronously (unlike server create for example), the client waits for the API response. If that request takes extensive time to complete, longer than any client idle time value on the API backend, the server closes the TCP connection with the openstackclient. In such cases the client automatically repeats the POST request, which by default is a wrong behaviour.
  Following are logs from the 'openstack sbunet create example' where execution of the initial request took over 100s (keystone auth logs omitted for clarity):
  # the command:
  os subnet create \
    --network cd275f37-deec-4f0f-9ad3-4816ca52678e \
    --gateway 172.16.0.1 \
    --allocation-pool start=172.16.0.100,end=172.16.0.200 \
    --subnet-range 172.16.0.0/24 \
    --dns-nameserver 8.8.8.8 \
    --no-dhcp external-subnet-2 \
    --debug

  # the command output:
  Network client initialized using OpenStack SDK: <openstack.network.v2._proxy.Proxy object at 0x7f7ec8f34490>
  REQ: curl -g -i -X GET http://10.5.100.3:9696/v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e -H "User-Agent: openstacksdk/0.46.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" -H "X-Auth-Token: {SHA256}6d3a82e66bdeb642bcfef551c78dbc0063e1bad49431683e6a0023f28dbc1d7d"
  Starting new HTTP connection (1): 10.5.100.3:9696
  http://10.5.100.3:9696 "GET /v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e HTTP/1.1" 200 726
  RESP: [200] Connection: keep-alive Content-Length: 726 Content-Type: application/json Date: Thu, 16 Mar 2023 07:23:59 GMT X-Openstack-Request-Id: req-3abe5525-cde2-436d-bf39-34746d76f07a
  RESP BODY: {"network":{"id":"cd275f37-deec-4f0f-9ad3-4816ca52678e","name":"ext_net","tenant_id":"d6761dacbb0649189a07a4a1a191a8c0","admin_state_up":true,"mtu":1500,"status":"ACTIVE","subnets":["040a6d57-b6fc-436a-b531-03baf4abb609"],"shared":false,"availability_zone_hints":[],"availability_zones":["nova"],"ipv4_address_scope":null,"ipv6_address_scope":null,"router:external":true,"description":"","port_security_enabled":true,"dns_domain":"","l2_adjacency":true,"is_default":false,"tags":[],"created_at":"2022-05-06T10:18:25Z","updated_at":"2023-03-16T06:35:33Z","revision_number":16,"project_id":"d6761dacbb0649189a07a4a1a191a8c0","provider:network_type":"flat","provider:physical_network":"physnet1","provider:segmentation_id":null}}
  GET call to network for http://10.5.100.3:9696/v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e used request id req-3abe5525-cde2-436d-bf39-34746d76f07a
  REQ: curl -g -i -X POST http://10.5.100.3:9696/v2.0/subnets -H "Content-Type: application/json" -H "User-Agent: openstacksdk/0.46.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" -H "X-Auth-Token: {SHA256}6d3a82e66bdeb642bcfef551c78dbc0063e1bad49431683e6a0023f28dbc1d7d" -d '{"subnet": {"allocation_pools": [{"start": "172.16.0.100", "end": "172.16.0.200"}], "ip_version": 4, "gateway_ip": "172.16.0.1", "name": "external-subnet-2", "dns_nameservers": ["8.8.8.8"], "enable_dhcp": false, "cidr": "172.16.0.0/24", "network_id": "cd275f37-deec-4f0f-9ad3-4816ca52678e"}}'
  Failure: Unable to establish connection to http://10.5.100.3:9696/v2.0/subnets: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')). Retrying in 0.5s.
  Starting new HTTP connection (2): 10.5.100.3:9696
  http://10.5.100.3:9696 "POST /v2.0/subnets HTTP/1.1" 400 221
  RESP: [400] Connection: keep-alive Content-Length: 221 Content-Type: application/json Date: Thu, 16 Mar 2023 07:25:30 GMT X-Openstack-Request-Id: req-d0632669-e1ce-4157-b5a5-e52be6e51873
  RESP BODY: {"NeutronError": {"type": "InvalidInput", "message": "Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.", "detail": ""}}
  POST call to network for http://10.5.100.3:9696/v2.0/subnets used request id req-d0632669-e1ce-4157-b5a5-e52be6e51873
  BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cliff/app.py", line 400, in run_subcommand
      result = cmd.run(parsed_args)
    File "/usr/lib/python3/dist-packages/osc_lib/command/command.py", line 41, in run
      return super(Command, self).run(parsed_args)
    File "/usr/lib/python3/dist-packages/cliff/display.py", line 117, in run
      column_names, data = self.take_action(parsed_args)
    File "/usr/lib/python3/dist-packages/openstackclient/network/v2/subnet.py", line 377, in take_action
      obj = client.create_subnet(**attrs)
    File "/usr/lib/python3/dist-packages/openstack/network/v2/_proxy.py", line 3491, in create_subnet
      return self._create(_subnet.Subnet, **attrs)
    File "/usr/lib/python3/dist-packages/openstack/proxy.py", line 417, in _create
      return res.create(self, base_path=base_path)
    File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1291, in create
      self._translate_response(response, has_body=has_body)
    File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1113, in _translate_response
      exceptions.raise_from_response(response, error_message=error_message)
    File "/usr/lib/python3/dist-packages/openstack/exceptions.py", line 234, in raise_from_response
      raise cls(
  openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
  clean_up CreateSubnet: BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
  END return value: 1

  From the neutron-server.log the first request from client came in:
  2023-03-16 07:23:59.608 2384978 DEBUG neutron.api.v2.base [req-66884669-2253-4faf-95a4-ebbe353896a0 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] Request body: {'subnet': {'allocation_pools': [{'start': '172.16.0.100', 'end': '172.16.0.200'}], 'ip_version': 4, 'gateway_ip': '172.16.0.1', 'name': 'external-subnet-2', 'dns_nameservers': ['8.8.8.8'], 'enable_dhcp': False, 'cidr': '172.16.0.0/24', 'network_id': 'cd275f37-deec-4f0f-9ad3-4816ca52678e'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:719

  and ended successfully after 131s:
  2023-03-16 07:26:10.768 2384978 INFO neutron.wsgi [req-66884669-2253-4faf-95a4-ebbe353896a0 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] 10.5.2.174 "POST /v2.0/subnets HTTP/1.1" status: 201  len: 0 time: 131.1619370

  But in the meaintime, the second request came as a result of the client retry action:
  2023-03-16 07:25:30.129 2384976 DEBUG neutron.api.v2.base [req-d0632669-e1ce-4157-b5a5-e52be6e51873 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] Request body: {'subnet': {'allocation_pools': [{'start': '172.16.0.100', 'end': '172.16.0.200'}], 'ip_version': 4, 'gateway_ip': '172.16.0.1', 'name': 'external-subnet-2', 'dns_nameservers': ['8.8.8.8'], 'enable_dhcp': False, 'cidr': '172.16.0.0/24', 'network_id': 'cd275f37-deec-4f0f-9ad3-4816ca52678e'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:719

  And of course it failed validation:
  2023-03-16 07:25:30.409 2384976 INFO neutron.wsgi [req-d0632669-e1ce-4157-b5a5-e52be6e51873 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] 10.5.2.174 "POST /v2.0/subnets HTTP/1.1" status: 400  len: 429 time: 0.2894270

  The first request comlete sucessfully but the user wouldn't know about it. Instad he got measleading error as above.
  This problem comes from the keystoneauth1 library, used by the python-openstackclient:
  https://github.com/openstack/keystoneauth/blob/stable/ussuri/keystoneauth1/session.py#L1019
  https://github.com/openstack/keystoneauth/blob/stable/ussuri/keystoneauth1/session.py#L1031

  In order to reproduce the problem I set python debugger on the neutron server code, here:
  https://opendev.org/openstack/neutron/src/branch/master/neutron/api/rpc/handlers/resources_rpc.py#L237

  But this problem originally comes from the production environment with
  60+ compute nodes (DVR) and notification about changes due to the
  subnet creations takes over 100s.

  Kind regards,
  Marcin

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-openstackclient/+bug/2012047/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list