ACK / APPLIED[F]: [SRU][D][E][F][PATCH 0/1] cifs: DFS Caching feature causing problems traversing multi-tier DFS setups

Seth Forshee seth.forshee at canonical.com
Mon Dec 16 19:58:00 UTC 2019


On Mon, Dec 16, 2019 at 12:20:32PM +1300, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1854887
> 
> [Impact]
> 
> There is a problem where kernels 5.0-rc1 and onwards cannot mount a multi tier
> cifs DFS setup, while kernels 4.20 and below can mount the share fine.
> 
> The DFS tiering structure looks like this:
> 
> Domain virtual DFS (i.e. \\company.com\folders\share)
> |-- Domain controller DFS (i.e. \\regional-dc.company.com\folders\share)
>     |-- Regional DFS Server (i.e. \\regional-dfs.company.com\folders\share)
>         |-- Actual file server (i.e. \\regional-svr.company.com\share)
> 
> On the 5.x series kernels, after getting the DFS referrals list through to the
> Regional DFS Server, which responds with the correct server/share, instead of 
> going to the Actual file server, the kernel backtracks from the Regional DFS 
> Server back to the Domain controller and requests the share there. Of course, 
> this share does not exist on the Domain controller, as it only exists on the 
> Actual file server, and the connection dies.
> 
> We have collected a packet capture, and the flow looks like this:
> 
> Legend:
> --------------------------------------------------
> DC = Domain Controller / Domain DFS Root
> RDC = Regional Domain Controller / Domain DFS Root
> RDS = Regional DFS Server
> AFS = Actual File Server
> 
> 4.18.0-21-generic Ubuntu kernel - Good
> 
> Host: request/response
> --------------------------------------------------------------------
> DC: company.com\folders
> DC: Referral List
> RDC: start convo
> RDC: <Regional Domain Controller>\Folders\Country\<Share> referral
> RDC: <Regional Domain Controller>\Folders\Country\<Share> referral
> RDS: start convo
> RDS: <Regional DFS Server>\Root\Country\<Share>
> RDS: STATUS_PATH_NOT_COVERED
> RDS: request referrals
> RDS: Referral List
> AFS: convo started
> AFS: <Actual File Server>\<Share>
> AFS: Good response
> 
> 5.0.0-26-generic Ubuntu kernel - Bad
> 
> Host: request/response
> ------------------------------------------------------------
> DC: company.com\folders
> RDC: start convo
> RDC: <Regional Domain Controller>\Folders\Country\<Share>
> RDC: STATUS_PATH_NOT_COVERED
> RDS: start convo
> RDS: <Regional DFS Server>\Root\Country\<Share>
> RDS: STATUS_PATH_NOT_COVERED
> RDC: <Regional DFS Server>\Root\Country\<Share>
> RDC: STATUS_PATH_NOT_COVERED
> 
> From there the debugging output was more or less the same between the two kernel
> versions, until the problematic area:
> 
> Linux 4.18:
> 
> Full log: https://paste.ubuntu.com/p/D9XwBbvTXc/
> 
> Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
> fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
> fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS Server>\Root\Country\<Share>>
> fs/cifs/misc.c: num_referrals: 1 dfs flags: 0x2 ...
> fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: <Actual File Server> to <IPV4 Address>
> fs/cifs/connect.c: Username: XXX
> // mounts the share successfully
> 
> Linux 5.0:
> 
> Full log: https://paste.ubuntu.com/p/9sXPj7WMQv/
> 
> Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
> fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
> fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/dfs_cache.c: do_dfs_cache_find: search path: \<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/dfs_cache.c: do_dfs_cache_find: cache miss
> fs/cifs/dfs_cache.c: do_dfs_cache_find: DFS referral request for \<Regional DFS Server>\Root\Country\<Share>
> fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS Server>\Root\Country\<Share>>
> fs/cifs/smb2pdu.c: SMB2 IOCTL
> Status code returned 0xc0000225 STATUS_NOT_FOUND
> fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000225 to POSIX err -2
> // mounting the share fails shortly after
> 
> This has quite a big impact to customers who need to mount their multi-tier DFS 
> mounts, as they have to remain on the 4.15 bionic kernel and cannot use the HWE 
> kernel for their machines.
> 
> [Fix]
> 
> After some debugging, I narrowed the cause down to a new DFS caching feature 
> introduced in 5.0-rc1. I started a discussion with the upstream maintainer of 
> cifs, which you can read here:
> 
> https://lore.kernel.org/linux-cifs/05aa2995-e85e-0ff4-d003-5bb08bd17a22@canonical.com/T/#u
> 
> This discussion resulted in the below upstream commit, which was merged in the 
> 5.5 development window:
> 
> commit 5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
> Author: Paulo Alcantara (SUSE) <pc at cjr.nz>
> Date: Fri Nov 22 12:30:56 2019 -0300
> Subject: cifs: Fix retrieval of DFS referrals in cifs_mount()
> 
> You can read it here:
> https://github.com/torvalds/linux/commit/5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
> 
> This commit sets referrals to be passed to the newest resolved root server, 
> instead of older ones up the order. This ensures that we keep descending down 
> the tree instead of backtracking, which what was happening.
> 
> This commit has been submitted for upstream -stable, and is still being 
> processed. The commit is needed on kernels 5.0 and up. I will update this 
> section if it is accepted for -stable.
> 
> [Testcase]
> 
> To test this commit you need a multi-tier cifs DFS with a similar structure as 
> the tree mentioned in the Impact section. From there, you simply try and mount 
> a cifs share.
> 
> On patched kernels, the mount will succeed. On broken kernels, the mount will 
> fail.
> 
> I have prepared a test kernel for Bionic HWE, based on 5.0.0-37.40~18.04 which 
> you can find here:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf245466-test
> 
> This test kernel has been tested by the customer and mounts the cifs DFS 
> correctly.
> 
> [Regression Potential]
> 
> I believe the risk of regression for this commit is low. All changes are limited
> to DFS within cifs, and only change the behaviour of what server is the root 
> server referrals are sent to.
> 
> The commit is a clean cherry pick for disco, eoan and focal. The maintainer has 
> submitted the commit for upstream -stable, and we have tested the commit with 
> the customer, and things are now working as intended.

Acked-by: Seth Forshee <seth.forshee at canonical.com>

Applied to focal/master-next, thanks!



More information about the kernel-team mailing list