[Bug 2065856] Re: SEGFAULTs in v4.3 with GlusterFS

Christian Ehrhardt  2065856 at bugs.launchpad.net
Fri Jun 21 08:15:33 UTC 2024


Patch review

First of all it is not as complex as it seems, it does the very same multiple times.
But I still found some things I need to ask.

Overall, once you had a look at those 4 points, addressed or explained
them the next sponsor coming by (or myself if I do not miss the ping due
to your reply) could sponsor that to Oracular and for the Noble SRU.


#1
I saw the context changed around
  PTHREAD_RWLOCK_destroy(&my_fd->fdlock);
which formerly was in many *_free_state functions in the version that is in noble.

That is part of the removing of all custom free functions used as function pointer in the ops struct.
What is left is that instead of through the function pointer like
  op_ctx->fsal_export->exp_ops.free_state(
it now calls directly
  free_state(pfid->state);
And in these calls the right state is selected that needs to be processed.

The new free_state now checks for bad  state.
Then only if a custom state free function is registered calls that or to gsh_free as final fallback.
Those are all set to NULL as seen in `grep -Hrn init_state src/* -A1`.
So gsh_free it is which is in src/include/abstract_mem.h 

But that is just:
254 static inline void                                                               
255 gsh_free(void *p)                                                                
256 {                                                                                
257 »···free(p);                                                                     
258 }


Maybe I just do not see it yet, but could you explain why the proposed changes would not break the locking which seems to be no more done?
I'm sure the SRU team would want to know as well before accepting.



#2
Further there is a comment lost in the struct of *_state_fd
Upstream added:
  /** state MUST be first to use default free_state */
The comment is not functional and therefore not a huge problem.

But I also see no reason why it is not added, it would only apply other
backports harder right?

If there is a reason why that isn't added let explain it.
And if there is none the next revision could have that change to be more similar to the upstream patch right?


#3 the aforementioned versioning and changelog
I think this should be
3.1 oracular
nfs-ganesha (4.3-8ubuntu2) oracular; urgency=medium                               
                                                                                 
  * Fix null pointer ref in free_state (LP: #2065856)                            
    - d/p/20-fix-null-ptr-ref-in-free-state.diff 
3.2 noble
nfs-ganesha (4.3-8ubuntu1.1) noble; urgency=medium                               
                                                                                 
  * Fix null pointer ref in free_state (LP: #2065856)                            
    - d/p/20-fix-null-ptr-ref-in-free-state.diff 


#4 dep-3 headers
As mentioned I'm happy you have them.
But since I came to the point to ask you to refresh your proposal anyway.
On one hand since it is adapted we'd change origin->backport
And on the other I see no reason why we should not use exactly the text from the upstream patch.
Was there a reason to reformat it?
I'd suggest starting it with

Description: free_state may not always be able to be called with a valid export
 Instead of trying to use the export attached to a state to call
 the FSAL free_state, use a function pointer which if non-NULL
 is used, otherwise we just gsh_free the state (which works for all
 in-tree FSALs).
Origin: backport, https://github.com/nfs-ganesha/nfs-ganesha/commit/336abcba0b9e0dae0aadb4657c311d04862f2028
Bug: https://github.com/nfs-ganesha/nfs-ganesha/issues/904
Bug-Ubuntu: https://launchpad.net/bugs/2065856
Author: Frank S. Filz <ffilzlnx at mindspring.com>
Reviewed-By: Frank S. Filz <ffilzlnx at mindspring.com>
Last-Update: 2024-05-16

---


** Bug watch added: github.com/nfs-ganesha/nfs-ganesha/issues #904
   https://github.com/nfs-ganesha/nfs-ganesha/issues/904

** Changed in: nfs-ganesha (Ubuntu Mantic)
       Status: New => Won't Fix

** Changed in: nfs-ganesha (Ubuntu Noble)
       Status: New => Confirmed

** Changed in: nfs-ganesha (Ubuntu Oracular)
       Status: In Progress => Confirmed

** Changed in: nfs-ganesha (Ubuntu Noble)
       Status: Confirmed => Incomplete

** Changed in: nfs-ganesha (Ubuntu Oracular)
       Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nfs-ganesha in Ubuntu.
https://bugs.launchpad.net/bugs/2065856

Title:
  SEGFAULTs in v4.3 with GlusterFS

Status in nfs-ganesha package in Ubuntu:
  Incomplete
Status in nfs-ganesha source package in Mantic:
  Won't Fix
Status in nfs-ganesha source package in Noble:
  Incomplete
Status in nfs-ganesha source package in Oracular:
  Incomplete

Bug description:
  [ Impact ]

   * Users experience SEGFAULTs when Ganesha fails to communicate with
  Gluster, causing the NFS share to disappear.

   * The patch refactors how free_space() is called to avoid the null
  pointer deference.

  [ Test Plan ]

   * I experienced crashes about once per day on volumes with heavy
  large file random access. I've been running the proposed patch since
  2024-05-16 and haven't experienced a single crash.

   * The test plan is to run the packages from proposed for at least 5
  days, under the same load as when the bug happened, and confirm that
  the crashes reported in this bug no longer happen.

  [ Other Info ]

   * Commit which added this patch: https://github.com/nfs-ganesha/nfs-ganesha/commit/336abcba0b9e0dae0aadb4657c311d04862f2028
   * Issue discussion: https://github.com/nfs-ganesha/nfs-ganesha/issues/904

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-ganesha/+bug/2065856/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list