[Bug 1863162]
Louis Benazet
1863162 at bugs.launchpad.net
Tue Mar 28 20:25:27 UTC 2023
Hello, it seems this bugfix has reached Ubuntu 22.04, which is nice.
However some of my users use earlier Ubuntu versions (18.04 and 20.04)
where it is not available. How can I apply the fix on those versions? I
tried checking out the glibc version available in Ubuntu 18.04 (2.27)
and cherry-picking the fixes. But I get a conflict I don't know how to
solve because the base files are very different.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1863162
Title:
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init:
Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
Status in GLibC:
Fix Released
Status in glibc package in Ubuntu:
Confirmed
Bug description:
When using glibc as part of our NSX product, we are running into the
above mentioned glibc assert case sometimes.
Here's relevant revision information :
Ubuntu - 16.04
glibc version - 2.23.
This is a known issue with resolution identified as per thread link below :
https://sourceware.org/ml/libc-alpha/2016-01/msg00480.htm and in
addition see Comment 9 in
https://sourceware.org/bugzilla/show_bug.cgi?id=19329.
We have applied this patch in our product and it seems to be working
fine.
Is there a way to upstream these changes and make those available in
standard glibc upstream?
Please let us know.
Here are the two patches:
PATCH1
=============================
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 6f178b3..2b97605 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -524,9 +524,16 @@ dl_open_worker (void *a)
}
/* Bump the generation number if necessary. */
- if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
- _dl_fatal_printf (N_("\
+ if (any_tls)
+ {
+ size_t newgen = GL(dl_tls_generation) + 1;
+ if (__builtin_expect (newgen == 0, 0))
+ _dl_fatal_printf (N_("\
TLS generation counter wrapped! Please report this."));
+ /* Synchronize with the load acquire in _dl_allocate_tls_init.
+ See the CONCURRENCY NOTES there in dl-tls.c. */
+ atomic_store_release (&GL(dl_tls_generation), newgen);
+ }
/* We need a second pass for static tls data, because _dl_update_slotinfo
must not be run while calls to _dl_add_to_slotinfo are still pending. */
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index ed13fd9..7184a54 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -443,6 +443,48 @@ _dl_resize_dtv (dtv_t *dtv)
}
+/* CONCURRENCY NOTES:
+
+ During dynamic TLS and DTV allocation and setup various objects may be
+ accessed concurrently:
+
+ GL(dl_tls_max_dtv_idx)
+ GL(dl_tls_generation)
+ listp->slotinfo[i].map
+ listp->slotinfo[i].gen
+ listp->next
+
+ where listp is a node in the GL(dl_tls_dtv_slotinfo_list) list. The public
+ APIs that may access them are
+
+ Writers: dlopen, dlclose and dynamic linker start up code.
+ Readers only: pthread_create and __tls_get_addr (TLS access).
+
+ The writers hold the GL(dl_load_lock), but the readers don't, so atomics
+ should be used when accessing these globals.
+
+ dl_open_worker (called from dlopen) for each loaded module increases
+ GL(dl_tls_max_dtv_idx), sets the link_map of the module up, adds a new
+ slotinfo entry to GL(dl_tls_dtv_slotinfo_list) with the new link_map and
+ the next generation number GL(dl_tls_generation)+1. Then it increases
+ GL(dl_tls_generation) which sinals that the new slotinfo entries are ready.
+ This last write is release mo so previous writes can be synchronized.
+
+ GL(dl_tls_max_dtv_idx) is always an upper bound of the modids of the ready
+ entries. The slotinfo list might be shorter than that during dlopen.
+ Entries in the slotinfo list might have gen > GL(dl_tls_generation) and
+ map == NULL.
+
+ _dl_allocate_tls_init is called from pthread_create and it looks through
+ the slotinfo list to do the dynamic TLS and DTV setup for the new thread.
+ It first loads the current GL(dl_tls_generation) with acquire mo and only
+ considers modules up to that generation ignoring any later change to the
+ slotinfo list.
+
+ TODO: Entries might get changed and freed in dlclose without sync.
+ TODO: __tls_get_addr is not yet synchronized with dlopen and dlclose.
+*/
+
void *
internal_function
_dl_allocate_tls_init (void *result)
@@ -455,9 +497,18 @@ _dl_allocate_tls_init (void *result)
struct dtv_slotinfo_list *listp;
size_t total = 0;
size_t maxgen = 0;
-
- /* Check if the current dtv is big enough. */
- if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
+ size_t gen_count;
+ size_t dtv_slots;
+
+ /* Synchronize with the release mo store in dl_open_worker, modules with
+ larger generation number are ignored. */
+ gen_count = atomic_load_acquire (&GL(dl_tls_generation));
+ /* Check if the current dtv is big enough. GL(dl_tls_max_dtv_idx) is
+ concurrently modified, but after the release mo store to
+ GL(dl_tls_generation) it always remains a modid upper bound for
+ previously loaded modules so relaxed access is enough. */
+ dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
+ if (dtv[-1].counter < dtv_slots)
{
/* Resize the dtv. */
dtv = _dl_resize_dtv (dtv);
@@ -480,18 +531,25 @@ _dl_allocate_tls_init (void *result)
void *dest;
/* Check for the total number of used slots. */
- if (total + cnt > GL(dl_tls_max_dtv_idx))
+ if (total + cnt > dtv_slots)
break;
- map = listp->slotinfo[cnt].map;
+ /* Synchronize with the release mo store in _dl_add_to_slotinfo in
+ dlopen, so the generation number read below is for a valid entry.
+ TODO: remove_slotinfo in dlclose is not synchronized. */
+ map = atomic_load_acquire (&listp->slotinfo[cnt].map);
if (map == NULL)
/* Unused entry. */
continue;
+ size_t gen = listp->slotinfo[cnt].gen;
+ if (gen > gen_count)
+ /* New, concurrently loaded entry. */
+ continue;
+
/* Keep track of the maximum generation number. This might
not be the generation counter. */
- assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));
- maxgen = MAX (maxgen, listp->slotinfo[cnt].gen);
+ maxgen = MAX (maxgen, gen);
dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED;
dtv[map->l_tls_modid].pointer.is_static = false;
@@ -518,11 +576,14 @@ _dl_allocate_tls_init (void *result)
}
total += cnt;
- if (total >= GL(dl_tls_max_dtv_idx))
+ if (total >= dtv_slots)
break;
- listp = listp->next;
- assert (listp != NULL);
+ /* Synchronize with the release mo store in _dl_add_to_slotinfo
+ so only initialized slotinfo nodes are looked at. */
+ listp = atomic_load_acquire (&listp->next);
+ if (listp == NULL)
+ break;
}
/* The DTV version is up-to-date now. */
@@ -916,7 +977,7 @@ _dl_add_to_slotinfo (struct link_map *l)
the first slot. */
assert (idx == 0);
- listp = prevp->next = (struct dtv_slotinfo_list *)
+ listp = (struct dtv_slotinfo_list *)
malloc (sizeof (struct dtv_slotinfo_list)
+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
if (listp == NULL)
@@ -939,9 +1000,15 @@ cannot create TLS data structures"));
listp->next = NULL;
memset (listp->slotinfo, '\0',
TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
+ /* _dl_allocate_tls_init concurrently walks this list at thread
+ creation, it must only observe initialized nodes in the list.
+ See the CONCURRENCY NOTES there. */
+ atomic_store_release (&prevp->next, listp);
}
/* Add the information into the slotinfo data structure. */
- listp->slotinfo[idx].map = l;
listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
+ /* Synchronize with the acquire load in _dl_allocate_tls_init.
+ See the CONCURRENCY NOTES there. */
+ atomic_store_release (&listp->slotinfo[idx].map, l);
}
PATCH 2
========
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 073321c..2c9ad2a 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
}
total += cnt;
- if (total >= dtv_slots)
+ if (total > dtv_slots)
break;
/* Synchronize with dl_add_to_slotinfo. */
To manage notifications about this bug go to:
https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions
More information about the foundations-bugs
mailing list