[3.8.y.z extended stable] Patch "net: use __GFP_NORETRY for high order allocations" has been added to staging queue

Kamal Mostafa kamal at canonical.com
Wed Mar 26 19:48:41 UTC 2014


This is a note to let you know that I have just added a patch titled

    net: use __GFP_NORETRY for high order allocations

to the linux-3.8.y-queue branch of the 3.8.y.z extended stable tree 
which can be found at:

 http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.8.y-queue

This patch is scheduled to be released in version 3.8.13.21.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.8.y.z tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

------

>From 876a652d04103c7c76a8a0a5ab44b11b3d6585cd Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet at google.com>
Date: Thu, 6 Feb 2014 10:42:42 -0800
Subject: net: use __GFP_NORETRY for high order allocations

[ Upstream commit ed98df3361f059db42786c830ea96e2d18b8d4db ]

sock_alloc_send_pskb() & sk_page_frag_refill()
have a loop trying high order allocations to prepare
skb with low number of fragments as this increases performance.

Problem is that under memory pressure/fragmentation, this can
trigger OOM while the intent was only to try the high order
allocations, then fallback to order-0 allocations.

We had various reports from unexpected regressions.

According to David, setting __GFP_NORETRY should be fine,
as the asynchronous compaction is still enabled, and this
will prevent OOM from kicking as in :

CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0,
oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled
CFSClientEventm

Call Trace:
 [<ffffffff8043766c>] dump_header+0xe1/0x23e
 [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
 [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
 [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
 [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
 [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
 [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
 [<ffffffff80295fa0>] tcp_sendmsg+0x560/0xee0
 [<ffffffff802a5037>] inet_sendmsg+0x67/0x100
 [<ffffffff80283c9c>] __sock_sendmsg_nosec+0x6c/0x90
 [<ffffffff80283e85>] sock_sendmsg+0xc5/0xf0
 [<ffffffff802847b6>] __sys_sendmsg+0x136/0x430
 [<ffffffff80284ec8>] sys_sendmsg+0x88/0x110
 [<ffffffff80711472>] system_call_fastpath+0x16/0x1b
Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child

Signed-off-by: Eric Dumazet <edumazet at google.com>
Acked-by: David Rientjes <rientjes at google.com>
Acked-by: "Eric W. Biederman" <ebiederm at xmission.com>
Signed-off-by: David S. Miller <davem at davemloft.net>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
 net/core/sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7367c05..68aad38 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1801,7 +1801,7 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 		gfp_t gfp = sk->sk_allocation;

 		if (order)
-			gfp |= __GFP_COMP | __GFP_NOWARN;
+			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
 		pfrag->page = alloc_pages(gfp, order);
 		if (likely(pfrag->page)) {
 			pfrag->offset = 0;
--
1.8.3.2





More information about the kernel-team mailing list