[SRU][D][E][F][PATCH 0/1] cifs: DFS Caching feature causing problems traversing multi-tier DFS setups
Matthew Ruffell
matthew.ruffell at canonical.com
Sun Dec 15 23:20:32 UTC 2019
BugLink: https://bugs.launchpad.net/bugs/1854887
[Impact]
There is a problem where kernels 5.0-rc1 and onwards cannot mount a multi tier
cifs DFS setup, while kernels 4.20 and below can mount the share fine.
The DFS tiering structure looks like this:
Domain virtual DFS (i.e. \\company.com\folders\share)
|-- Domain controller DFS (i.e. \\regional-dc.company.com\folders\share)
|-- Regional DFS Server (i.e. \\regional-dfs.company.com\folders\share)
|-- Actual file server (i.e. \\regional-svr.company.com\share)
On the 5.x series kernels, after getting the DFS referrals list through to the
Regional DFS Server, which responds with the correct server/share, instead of
going to the Actual file server, the kernel backtracks from the Regional DFS
Server back to the Domain controller and requests the share there. Of course,
this share does not exist on the Domain controller, as it only exists on the
Actual file server, and the connection dies.
We have collected a packet capture, and the flow looks like this:
Legend:
--------------------------------------------------
DC = Domain Controller / Domain DFS Root
RDC = Regional Domain Controller / Domain DFS Root
RDS = Regional DFS Server
AFS = Actual File Server
4.18.0-21-generic Ubuntu kernel - Good
Host: request/response
--------------------------------------------------------------------
DC: company.com\folders
DC: Referral List
RDC: start convo
RDC: <Regional Domain Controller>\Folders\Country\<Share> referral
RDC: <Regional Domain Controller>\Folders\Country\<Share> referral
RDS: start convo
RDS: <Regional DFS Server>\Root\Country\<Share>
RDS: STATUS_PATH_NOT_COVERED
RDS: request referrals
RDS: Referral List
AFS: convo started
AFS: <Actual File Server>\<Share>
AFS: Good response
5.0.0-26-generic Ubuntu kernel - Bad
Host: request/response
------------------------------------------------------------
DC: company.com\folders
RDC: start convo
RDC: <Regional Domain Controller>\Folders\Country\<Share>
RDC: STATUS_PATH_NOT_COVERED
RDS: start convo
RDS: <Regional DFS Server>\Root\Country\<Share>
RDS: STATUS_PATH_NOT_COVERED
RDC: <Regional DFS Server>\Root\Country\<Share>
RDC: STATUS_PATH_NOT_COVERED
>From there the debugging output was more or less the same between the two kernel
versions, until the problematic area:
Linux 4.18:
Full log: https://paste.ubuntu.com/p/D9XwBbvTXc/
Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS Server>\Root\Country\<Share>>
fs/cifs/misc.c: num_referrals: 1 dfs flags: 0x2 ...
fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: <Actual File Server> to <IPV4 Address>
fs/cifs/connect.c: Username: XXX
// mounts the share successfully
Linux 5.0:
Full log: https://paste.ubuntu.com/p/9sXPj7WMQv/
Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
fs/cifs/dfs_cache.c: do_dfs_cache_find: search path: \<Regional DFS Server>\Root\Country\<Share>
fs/cifs/dfs_cache.c: do_dfs_cache_find: cache miss
fs/cifs/dfs_cache.c: do_dfs_cache_find: DFS referral request for \<Regional DFS Server>\Root\Country\<Share>
fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS Server>\Root\Country\<Share>>
fs/cifs/smb2pdu.c: SMB2 IOCTL
Status code returned 0xc0000225 STATUS_NOT_FOUND
fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000225 to POSIX err -2
// mounting the share fails shortly after
This has quite a big impact to customers who need to mount their multi-tier DFS
mounts, as they have to remain on the 4.15 bionic kernel and cannot use the HWE
kernel for their machines.
[Fix]
After some debugging, I narrowed the cause down to a new DFS caching feature
introduced in 5.0-rc1. I started a discussion with the upstream maintainer of
cifs, which you can read here:
https://lore.kernel.org/linux-cifs/05aa2995-e85e-0ff4-d003-5bb08bd17a22@canonical.com/T/#u
This discussion resulted in the below upstream commit, which was merged in the
5.5 development window:
commit 5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
Author: Paulo Alcantara (SUSE) <pc at cjr.nz>
Date: Fri Nov 22 12:30:56 2019 -0300
Subject: cifs: Fix retrieval of DFS referrals in cifs_mount()
You can read it here:
https://github.com/torvalds/linux/commit/5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
This commit sets referrals to be passed to the newest resolved root server,
instead of older ones up the order. This ensures that we keep descending down
the tree instead of backtracking, which what was happening.
This commit has been submitted for upstream -stable, and is still being
processed. The commit is needed on kernels 5.0 and up. I will update this
section if it is accepted for -stable.
[Testcase]
To test this commit you need a multi-tier cifs DFS with a similar structure as
the tree mentioned in the Impact section. From there, you simply try and mount
a cifs share.
On patched kernels, the mount will succeed. On broken kernels, the mount will
fail.
I have prepared a test kernel for Bionic HWE, based on 5.0.0-37.40~18.04 which
you can find here:
https://launchpad.net/~mruffell/+archive/ubuntu/sf245466-test
This test kernel has been tested by the customer and mounts the cifs DFS
correctly.
[Regression Potential]
I believe the risk of regression for this commit is low. All changes are limited
to DFS within cifs, and only change the behaviour of what server is the root
server referrals are sent to.
The commit is a clean cherry pick for disco, eoan and focal. The maintainer has
submitted the commit for upstream -stable, and we have tested the commit with
the customer, and things are now working as intended.
Paulo Alcantara (SUSE) (1):
cifs: Fix retrieval of DFS referrals in cifs_mount()
fs/cifs/connect.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
--
2.20.1
More information about the kernel-team
mailing list