readahead - from a tar file

Shawn Rutledge shawn.t.rutledge at gmail.com
Tue Oct 30 02:13:44 GMT 2007


On 10/29/07, Rob Ubuntu Linux <rob.ubuntu.linux at googlemail.com> wrote:
> This smells of premature optimisation, and trying to solve an issue in
> the wrong place, by a one-off hack, with great code complexity, so I

Probably.

> don't really like it.
>
> That said, there's something in both lazy evaluation and cache warming
> schemes, although paradoxically they seem totally opposed.  So I've
> ended up thinking far too much and too long on this.
>
> On 10/22/07, Scott James Remnant <scott at netsplit.com> wrote:
>
> > On Mon, 2007-10-22 at 13:40 -0700, Shawn Rutledge wrote:
> >
> > > Another idea I had over the weekend to speed up boot times is also
> > > related to reducing disk seek time.
>
> What is the real problem?
>
> Is it seek times or lots of synchronous reads and serialised
> processes, doing lots of waiting on disk reads, re-parsing files in
> indirect ways and general neglect of "once only" startup code by hard
> pressed application developers struggling to cope with all the
> overwhelming complexity, and mindless incompatible environment
> modifications made by distro and OS developers?

I think there are other things that have room for improvement:
- using a gazillion shell scripts and various other kinds of utilities
for the sake of "ultimate flexibility" (expecting that a sysadmin
should be able to change any aspect of behavior by editing one of
those and rebooting... but only if he's knowledgeable enough to find
the right one to edit, out of increasingly so many).  Each script has
to be read, and parsed; and if it uses other utilities, they and their
dependencies have to be loaded and initialized too.
- putting a priority on having every service ready to go before the
user can log in
- doing things serially
- making parts of the init process wait for others (like DHCP as you
pointed out below)
- udev sometimes takes a lot of time to wait for device nodes to be
populated; and there are other hardware-detection schemes that are
also time consuming on some distros

Of course I know some of these have been partially solved... and being
new to Ubuntu my criticism is based more on experiences with Gentoo
and Debian.  I recently tried Gentoo's parallel init and it doesn't
seem to make enough difference, as I would have hoped.  But Gentoo
uses a lot of conf files.

For embedded systems I'd like the top priority to be getting X up and
running, and getting the first app running (whether that is the main
app for which the embedded system was created, or just some kind of
login screen or launcher app).  If the app has service dependencies,
fine... but those can still be parallelized as much as possible.  Such
an approach has its benefits for desktop systems too.  The popular
commercial OS's do not wait for network intialization... DHCP can
happen at any time.  Network services don't have to wait for an IP
address to be assigned, in order to be able to start up.  And things
like databases often (but not always) could start up "gracefully" in
the background.

I'm thinking maybe the best thing for regular users would be yet
another init system that is much more straightforward and simple from
the user's perspective.  To edit the startup script (which would
probably be something more efficient than a shell script), you use a
graphical editor that shows the dependency tree (among other views)
and you can re-order the steps as you please, or turn them on and off,
subject to some constraints.  (Or maybe you don't even need to
re-order them, if there is a list of dependencies for each step, and
maybe a priority.)  Init would go from needing dozens of files to just
needing 2 or 3 to get started (init itself, maybe some libs if it's
not statically linked, and the one machine-generated startup script)
plus files for whatever processes init is actually starting up.

Of course lots of people will hate this idea because it's not the Unix
Way (supposed to have lots of reusable little utilities working
together, not a monolithic do-everything monstrosity).  And a
centralized init script or centralized settings seems too
registry-like, and that results in a knee-jerk reaction.  But
everything has its advantages and disadvantages.

Well so far in my cursory survey of init replacements (upstart, runit,
init-ng, svscan, murdur and a couple others) it doesn't seem that any
of them are doing anything that radical...   Haven't really seen any
exhaustive benchmarks comparing all of them fairly, either.

There was that paper by somebody at IBM about using plain old 'make'
to find the dependencies, and -j n to start up n processes in
parallel.  (Because make was designed to find an optimal order for
resolving dependencies, in order to arrive at a goal, why not have the
goal be "starting everything", then just list the dependencies in
order for each step, and let it figure out how to get there?)  It
seems like a decent experiment.

So see, compared to those ideas, just getting the same old files into
memory faster might turn out to be the least disruptive thing to try
to do.  ;-)

> How long will it be before genuine Random Access solid state memory
> devices replace noisy unreliable slow, delicately coated iron with
> fragile precision engineered moving parts?

Sure hope so.

With those hybrid hard drives you don't have any control over what
gets cached in the FLASH, do you?  That could be useful.

> You could probably even use some kind of 'snapshot' with a specialised
> FS for flash memory devices.  Reversing the usual CacheFS concept, a
> smallish read-mostly disk filesystem containing config files like
> /etc, works opposite to read through and write back cacheing as
> modifications occuring on disk invalidate the cache entries, meaning
> reads bypass the cache until an update later in background, on a
> quiescent stable(ish) system.

Yeah maybe a sort of overlay FS which has the important boot files on
flash and is mounted "on top" of the disk at first... then after init
is complete, it's remounted "under" so that changes are written first
to the disk, and then at some point the flash is updated too.

> But if disk seek time is genuinely the issue, then simply partitioning
> the disk will improve things!   But Distro's actually all seem to have
> moved towards huge "simple" file system layouts; moving /var, /usr and
> /opt out of the root file system on my recent Gutsy install leaves :
<snip>
> So /etc, /bin, /sbin and /lib are confined to 1/2 Gb sandwhiched by
> /usr & /var, with  /boot to 1st 4 cylinders of disk.

Yeah it's a good point.  So if /var, /usr and /opt were on a disk and
the rest on an SSD (but with some volatile files like /etc/mtab etc.
symlinked to the disk)...and hopefully most of the early stuff in the
boot process would not need much of the stuff on the disk...

> > > I see that ubuntu already starts
> > > a "readahead" process before much else, to preload the necessary files
> > > into RAM.  This is an excellent idea, but those files are still
> > > potentially located all over the disk, right?
>
> Actually like shared libraries dynamically linked, you've pre-loaded
> them so they're either in the traditional buffer cache, or they've
> been mapped into RAM by the VM which can later re-use clean pages.
> The process of preloading could in principal use asynchronous reads,
> so the I/O scheduler can maximise the benefits of it's elevator.
> There's kernel hooks now for "Nice I/O" by background batch processes,
> to minimise the impact on "interactive jobs".

Cool.

> The problem is getting the overall startup throughput up, but without
> causing read starvation or waiting on a "heated cache", which hinders
> the attempt at parallelisation.

Maybe read starvation is inevitable, and you just need to optimize the
reads, to cut down on the seek times, and to read continuously so the
disk is never sitting idle?  It's very easy to see the idle spots now
with bootchart.

> Something like bringing up an DHCP network interface, takes an aeon

At least that can really be done in the background now.  I've done it
on gentoo... didn't pay attention whether ubuntu does that or not.

> compared to interpreting scripts or parsing of files loaded in memory,
> so anticipatory I/O ought to have significant benefits in keeping more
> processes runnable.

Right.  Well how do you anticipate the I/O in a way any better than
what readahead is doing?

> The basic unpacking a tgz idea is ugly and horrid in practice for the
> reasons suggested.

Gee thanks...but yeah having to hack the kernel to make it work is a
major bummer.

> However may be an "initrd" style idea decrompessing a known
> environment, into a temporary memory file system, could solve some
> bootstrapping problems.

The trouble with initrd is you will have two copies of every
executable and library in memory... one in the virtual "disk" and
another copied somewhere else in memory to actually run.  Now if you
combined initrd with XIP... but I wonder how much slower XIP is.  And
how do you actually do that?  I read somewhere that MontaVista is
doing something like that but never figured out the details.  Or just
temporarily eat the cost of having two copies, and then get rid of the
initrd ASAP, as soon as everything that needs to be read from there
has been read.  But, keeping the initrd image up-to-date is just as
bad as keeping the tarball up-to-date.  So how is it any better?  Just
that the kernel already supports initrd, and does not yet support a
tarball-as-a-cache... right?  (replace tar with cpio if you like,
since that's what an initrd is, right?)

> For instance running certain infrastructure programs chroot-ed in a
> dynamically linked mini-environment, before local & network disks are
> mounted.  The state they generate would be saved in memory mapped
> files or dumped out into a memory file system eg) tmpfs, so Upstart
> can have them reload from memory file on restarting them into
> "maxi-environment" in later phase of the boot process.

I thought pivot_root was the usual way of transitioning from an initrd
to a real disk, but I haven't actually implemented that on any system
so far.

> > The most interesting thing is to actually reorder the filesystem so that
> > the blocks you need are always at the front and always sequential.
>
> Theoretically you could actually use a pre-load library stub to
> 'trace' each start up execution, with directories accessed and files
> opened (or executed) saved for next time.  Then a process could
> initiate on boot, purely to warm up the disk and page cache.  As it
> load the actual data on disk not a copy, config changes could only
> slow the boot down not cause failures, a new file would simply be
> ignored, and changed file contents be pre-read seemlessly, a far more
> robust approach.

How is that different than the current readahead?

>  One non-blocking I/O thread runs per disk, queuing up large batches
> of requests anticipating the future demand of currently blocking (or
> even to be executed) processes, coordinated by a Master supervisory
> thread responsible for scheduling and thrash control.  This pushes the
> responsibility to the block I/O scheduler to optimise for throughput
> yet avoiding  read starvation to the real processes.   Asynchronous
> I/O is discussed at
> http://www.ibm.com/developerworks/linux/library/l-async/ but AIO
> itself I think has suffered due to seperate path through kernel
> syndrome.  Jens Axboe has worked on rework of block I/O layer in
> kernel (interview at http://kerneltrap.org/node/7637 ) and there's
> some interesting Zero Copy ideas in there to (splicing).

OK I'll read that.



More information about the upstart-devel mailing list