mdadm raid soft lock-ups ubuntu kernel 4.13.0-36
Kleber Souza
kleber.souza at canonical.com
Fri Jun 8 21:14:14 UTC 2018
On 06/08/18 06:20, Adam Hamsik wrote:
> Hi,
>
> we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36.
> We have created raid10 using 22 960GB SSDs [1] . The problem we're
> experiencing is that /usr/share/mdadm/checkarray
> (executed by cron, included in a mdadm pkg) results in (soft?)
> deadlock - load on the node spikes up to 500-700 and all I/O operations
> are blocked for a period of time. We can see traces liek these [2] in
> our kernel log.
>
> e.g. it ends up in static state like
>
> test at os-node1:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4]
> dm-19[5] dm-17[3]
> dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18]
> dm-11[17]
> dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12]
> dm-5[11] dm-4[10] dm-3[1] dm-2[0]
> 10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22]
> [UUUUUUUUUUUUUUUUUUUUUU]
> [===>.................] check = 19.0% (1965748032/10313171968)
> finish=1034728.8min speed=134K/sec
> bitmap: 0/39 pages [0KB], 131072KB chunk
> unused devices: <none>
>
> and the only solution is to hard reboot the node. What we found out is
> that it
> doesn't happen on idle raid, we have to generate some significant load
> (10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the issue.
>
> Anyone ever experienced similar issues? Do you have any suggestions how to
> better trouble shoot this issue and maybe identify if disks or software
> layer
> is responsible for this behaviour
>
> [1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
> [2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
> [3] https://github.com/axboe/fio
>
>
>
>
>
> Regards
> Adam.
>
> Adam Hamsik
> 00421 904 937 495
> adam.hamsik at chillisys.com <mailto:adam.hamsik at chillisys.com>
> haad at netbsd.org <mailto:haad at netbsd.org>
>
>
Hi Adam,
Thank you for reporting the problem. That seems to be something to be
investigated, however, we generally use this mailing-list for patch
submission and some other communications. Could you please open a bug
report on Launchpad against the linux package [1]? Once that's done
someone from our team will triage the bug and the investigation and
discussion can continue from there.
[1] https://bugs.launchpad.net/ubuntu/+source/linux
Thank you,
Kleber
More information about the kernel-team
mailing list