[SRU][B][PATCH 1/1] ipv4: Fix device used for dst_alloc with local routes

Luke Nowakowski-Krijger luke.nowakowskikrijger at canonical.com
Mon Oct 4 18:37:08 UTC 2021


From: David Ahern <dsahern at kernel.org>

Oliver reported a use case where deleting a VRF device can hang
waiting for the refcnt to drop to 0. The root cause is that the dst
is allocated against the VRF device but cached on the loopback
device.

The use case (added to the selftests) has an implicit VRF crossing
due to the ordering of the FIB rules (lookup local is before the
l3mdev rule, but the problem occurs even if the FIB rules are
re-ordered with local after l3mdev because the VRF table does not
have a default route to terminate the lookup). The end result is
is that the FIB lookup returns the loopback device as the nexthop,
but the ingress device is in a VRF. The mismatch causes the dst
alloc against the VRF device but then cached on the loopback.

The fix is to bring the trick used for IPv6 (see ip6_rt_get_dev_rcu):
pick the dst alloc device based the fib lookup result but with checks
that the result has a nexthop device (e.g., not an unreachable or
prohibit entry).

Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Reported-by: Oliver Herms <oliver.peter.herms at gmail.com>
Signed-off-by: David Ahern <dsahern at kernel.org>
Signed-off-by: David S. Miller <davem at davemloft.net>
(cherry picked from commit b87b04f5019e821c8c6c7761f258402e43500a1f)
Signed-off-by: Nicolas Dichtel <nicolas.dichtel at 6wind.com>
[lukenow: removed updates to tools/testing/selftests/net/fib_tests.sh as
they do not exist for bionic]
[lukenow: changed fib_nh_common to fib_nh and changed the associated
accesses to it]
Signed-off-by: Luke Nowakowski-Krijger <luke.nowakowskikrijger at canonical.com>
---
 net/ipv4/route.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 511f01c8c42f..e3323656a87d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1932,6 +1932,19 @@ static int ip_mkroute_input(struct sk_buff *skb,
 	return __mkroute_input(skb, res, in_dev, daddr, saddr, tos);
 }
 
+/* get device for dst_alloc with local routes */
+static struct net_device *ip_rt_get_dev(struct net *net,
+                                       const struct fib_result *res)
+{
+       struct fib_nh *nh = res->fi ? &FIB_RES_NH(*res) : NULL;
+       struct net_device *dev = NULL;
+
+       if (nh)
+               dev = l3mdev_master_dev_rcu(nh->nh_dev);
+
+       return dev ? : net->loopback_dev;
+}
+
 /*
  *	NOTE. We drop all the packets that has local source
  *	addresses, because every properly looped back packet
@@ -2069,7 +2082,7 @@ out:	return err;
 		}
 	}
 
-	rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev,
+	rth = rt_dst_alloc(ip_rt_get_dev(net, res),
 			   flags | RTCF_LOCAL, res->type,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false, do_cache);
 	if (!rth)
-- 
2.30.2




More information about the kernel-team mailing list