ACK / APPLIED[F]: [SRU][D][E][F][PATCH 0/1] cifs: DFS Caching feature causing problems traversing multi-tier DFS setups
Seth Forshee
seth.forshee at canonical.com
Mon Dec 16 19:58:00 UTC 2019
On Mon, Dec 16, 2019 at 12:20:32PM +1300, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1854887
>
> [Impact]
>
> There is a problem where kernels 5.0-rc1 and onwards cannot mount a multi tier
> cifs DFS setup, while kernels 4.20 and below can mount the share fine.
>
> The DFS tiering structure looks like this:
>
> Domain virtual DFS (i.e. \\company.com\folders\share)
> |-- Domain controller DFS (i.e. \\regional-dc.company.com\folders\share)
> |-- Regional DFS Server (i.e. \\regional-dfs.company.com\folders\share)
> |-- Actual file server (i.e. \\regional-svr.company.com\share)
>
> On the 5.x series kernels, after getting the DFS referrals list through to the
> Regional DFS Server, which responds with the correct server/share, instead of
> going to the Actual file server, the kernel backtracks from the Regional DFS
> Server back to the Domain controller and requests the share there. Of course,
> this share does not exist on the Domain controller, as it only exists on the
> Actual file server, and the connection dies.
>
> We have collected a packet capture, and the flow looks like this:
>
> Legend:
> --------------------------------------------------
> DC = Domain Controller / Domain DFS Root
> RDC = Regional Domain Controller / Domain DFS Root
> RDS = Regional DFS Server
> AFS = Actual File Server
>
> 4.18.0-21-generic Ubuntu kernel - Good
>
> Host: request/response
> --------------------------------------------------------------------
> DC: company.com\folders
> DC: Referral List
> RDC: start convo
> RDC: <Regional Domain Controller>\Folders\Country\<Share> referral
> RDC: <Regional Domain Controller>\Folders\Country\<Share> referral
> RDS: start convo
> RDS: <Regional DFS Server>\Root\Country\<Share>
> RDS: STATUS_PATH_NOT_COVERED
> RDS: request referrals
> RDS: Referral List
> AFS: convo started
> AFS: <Actual File Server>\<Share>
> AFS: Good response
>
> 5.0.0-26-generic Ubuntu kernel - Bad
>
> Host: request/response
> ------------------------------------------------------------
> DC: company.com\folders
> RDC: start convo
> RDC: <Regional Domain Controller>\Folders\Country\<Share>
> RDC: STATUS_PATH_NOT_COVERED
> RDS: start convo
> RDS: <Regional DFS Server>\Root\Country\<Share>
> RDS: STATUS_PATH_NOT_COVERED
> RDC: <Regional DFS Server>\Root\Country\<Share>
> RDC: STATUS_PATH_NOT_COVERED
>
> From there the debugging output was more or less the same between the two kernel
> versions, until the problematic area:
>
> Linux 4.18:
>
> Full log: https://paste.ubuntu.com/p/D9XwBbvTXc/
>
> Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
> fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
> fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS Server>\Root\Country\<Share>>
> fs/cifs/misc.c: num_referrals: 1 dfs flags: 0x2 ...
> fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: <Actual File Server> to <IPV4 Address>
> fs/cifs/connect.c: Username: XXX
> // mounts the share successfully
>
> Linux 5.0:
>
> Full log: https://paste.ubuntu.com/p/9sXPj7WMQv/
>
> Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
> fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
> fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/dfs_cache.c: do_dfs_cache_find: search path: \<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/dfs_cache.c: do_dfs_cache_find: cache miss
> fs/cifs/dfs_cache.c: do_dfs_cache_find: DFS referral request for \<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS Server>\Root\Country\<Share>>
> fs/cifs/smb2pdu.c: SMB2 IOCTL
> Status code returned 0xc0000225 STATUS_NOT_FOUND
> fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000225 to POSIX err -2
> // mounting the share fails shortly after
>
> This has quite a big impact to customers who need to mount their multi-tier DFS
> mounts, as they have to remain on the 4.15 bionic kernel and cannot use the HWE
> kernel for their machines.
>
> [Fix]
>
> After some debugging, I narrowed the cause down to a new DFS caching feature
> introduced in 5.0-rc1. I started a discussion with the upstream maintainer of
> cifs, which you can read here:
>
> https://lore.kernel.org/linux-cifs/05aa2995-e85e-0ff4-d003-5bb08bd17a22@canonical.com/T/#u
>
> This discussion resulted in the below upstream commit, which was merged in the
> 5.5 development window:
>
> commit 5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
> Author: Paulo Alcantara (SUSE) <pc at cjr.nz>
> Date: Fri Nov 22 12:30:56 2019 -0300
> Subject: cifs: Fix retrieval of DFS referrals in cifs_mount()
>
> You can read it here:
> https://github.com/torvalds/linux/commit/5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
>
> This commit sets referrals to be passed to the newest resolved root server,
> instead of older ones up the order. This ensures that we keep descending down
> the tree instead of backtracking, which what was happening.
>
> This commit has been submitted for upstream -stable, and is still being
> processed. The commit is needed on kernels 5.0 and up. I will update this
> section if it is accepted for -stable.
>
> [Testcase]
>
> To test this commit you need a multi-tier cifs DFS with a similar structure as
> the tree mentioned in the Impact section. From there, you simply try and mount
> a cifs share.
>
> On patched kernels, the mount will succeed. On broken kernels, the mount will
> fail.
>
> I have prepared a test kernel for Bionic HWE, based on 5.0.0-37.40~18.04 which
> you can find here:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf245466-test
>
> This test kernel has been tested by the customer and mounts the cifs DFS
> correctly.
>
> [Regression Potential]
>
> I believe the risk of regression for this commit is low. All changes are limited
> to DFS within cifs, and only change the behaviour of what server is the root
> server referrals are sent to.
>
> The commit is a clean cherry pick for disco, eoan and focal. The maintainer has
> submitted the commit for upstream -stable, and we have tested the commit with
> the customer, and things are now working as intended.
Acked-by: Seth Forshee <seth.forshee at canonical.com>
Applied to focal/master-next, thanks!
More information about the kernel-team
mailing list