please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)

James Bottomley James.Bottomley at HansenPartnership.com
Fri Mar 21 22:56:31 UTC 2014


On Fri, 2014-03-21 at 12:32 -0700, Linus Torvalds wrote:
> On Fri, Mar 21, 2014 at 11:34 AM, Oleg Nesterov <oleg at redhat.com> wrote:
> >
> > Yes, it seems that it actually needs > 30 secs. It spends most of the time
> > (30.13286 seconds) in [..]
> 
> So how about taking a completely different approach:
> 
>  - just say that waiting for devices in the module init sequence for
> over 30 seconds is really really wrong.
> 
>  - make the damn mptsas driver just register the controller from the
> init sequence, and then do device discovery asynchronously.
> 
> The ATA layer does this correctly: it synchronously finds each host,
> but then it does
> 
>         /* perform each probe asynchronously */
>         for (i = 0; i < host->n_ports; i++) {
>                 struct ata_port *ap = host->ports[i];
>                 async_schedule(async_port_probe, ap);
>         }
> 
> and I really think SCSI drivers should do the same if they have this
> kind of "ports can take forever to probe" behavior.
> 
> What would be the equivalent magic to do this for SCSI? Could we just
> make something like scsi_probe_and_add_lun() just always do this, the
> same way ata_host_register() does it?

Well, we do do this asynchronously.  The idea is that the add host only
initialises the actual hardware.  The port probing is supposed to be
done asynchronously (provided the async probe option is enabled in SCSI,
of course).  The way this is supposed to happen is the driver
initialises the hardware and then calls scsi_scan_host().  If the
platform is set up for async scanning, that kicks off all the async
workqueues and returns (or does it all synchronously if async scanning
isn't enabled).

It is possible fusion gets this wrong because the sas driver doesn't
really couple to SCSI's libsas, which is where it would pick up most of
the generic infrastructure for this.  Plus it depends where all the time
is being wasted.  The fusion was the last sas chipset I got the specs
for (under NDA).  It's actually table driven, so if the problem is the
controller taking ages to fill in the tables it might necessitate a
fusion specific fix.  I can see from the driver that it seems to do all
the probing itself instead of relying on probe callbacks from
scsi_scan_host(), so I know what needs to be fixed ... it's less clear
how easy this would be given how monolithic the routine looks.

James






More information about the kernel-team mailing list