[Bug 2093880] Re: libcephfs: flush the caps release in filesystem sync

Kinson Chan 2093880 at bugs.launchpad.net
Mon Jan 13 16:03:12 UTC 2025


Thank you for the explanation.  I see that the flushing caps can take
place at any time, such as during a filesystem sync.

As explained in the linked articles, the caps obtained by
`ceph_try_get_caps` can get lost.  To my limited knowledge, the backend
functions might get some caps even though it eventually went into an
error.  In such situation, value of `got` turns non-zero and yet `ret`
is negative.

The fix as found on Linux kernel 6.12 is that, when the code reaches the
`out` label, the value of `got` is examined and put back if necessary.
It is only 3 or 4 lines of changes, although I see it will need some
time for QA.

So my question is, whether the fix would be back ported to the kernel /
libcephfs of Ubuntu 22.04 LTS.  Thanks in advance for your help.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2093880

Title:
  libcephfs: flush the caps release in filesystem sync

Status in ceph package in Ubuntu:
  New

Bug description:
  Hi,

  The bug is as mentioned in the Ceph upstream: https://tracker.ceph.com/issues/67221
  and also Linux upstream: https://github.com/torvalds/linux/commit/ccda9910d8490f4fb067131598e4b2e986faa5a0

  Under some situations, the libcephfs forgets the acquired capabilities
  and thus gets evicted by the Ceph MDS.  I would like to see if a
  backport will be available for Ubuntu 22.04 LTS (Jammy) Ceph Quincy
  client (17.2.x).

  Logs on the client:
    Dec 19 04:32:45 *** kernel: libceph: mds0 (1)***:6801 socket closed (con state OPEN)
    Dec 19 04:32:46 *** kernel: libceph: mds0 (1)***:6801 session reset
    Dec 19 04:32:46 *** kernel: ceph: mds0 closed our session
    Dec 19 04:32:46 *** kernel: ceph: mds0 reconnect start
    Dec 19 04:32:46 *** kernel: ceph: mds0 reconnect denied
    Dec 19 04:32:46 *** kernel: libceph: mds0 (1)***:6801 socket closed (con state V1_CONNECT_MSG)
    Dec 19 04:32:47 *** kernel: ceph: mds0 rejected session

  Logs on the server: 
    Dec 19 04:30:17 *** ceph-mds[1408372]: log_channel(cluster) log [WRN] : client.911386 isn't responding to mclientcaps(revoke), ino 0x1004e6bede5 pending pAsLsXsFs issued pAsLsXsFs, sent 240.313055 seconds ago
    Dec 19 04:32:44 *** ceph-mds[1408372]: log_channel(cluster) log [INF] : Evicting (and blocklisting) client session 911386 (v1:***:0/362122962)

  The Ubuntu client version:
    Description:	Ubuntu 22.04.5 LTS
    Release:	22.04

  Package: 
    libcephfs2/jammy-updates,jammy-security,now 17.2.7-0ubuntu0.22.04.2 amd64 [installed,automatic]

  What expected to happen:
  * There shall be no 'client isn't responding to ... pending pAsLsXsFs ...' messages, and no eviction.

  What happened instead:
  * The error appeared and the client is evicted.  

  Thanks,
  Kinson

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2093880/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list