readahead - from a tar file

Jerome Haltom wasabi at larvalstage.net
Tue Oct 30 00:47:21 GMT 2007


I think most of the brights around here have a pretty good understanding
of how this SHOULD work, in the ideal case. All jobs should start async,
all jobs should use async IO, and to the best of their ability submit
async IO requests for the files they need as soon as they need them.

The kernel would be smart enough to do this efficiently. It would not
fulfill async IO requests for a process if it would require swapping out
an async IO request that just ran a few moments ago. Insert intelligence
here.

And add on top of this some sort of file system block-level
optimization.

Voila.

Just a matter of doing the work.

On Mon, 2007-10-22 at 13:40 -0700, Shawn Rutledge wrote:
> Another idea I had over the weekend to speed up boot times is also
> related to reducing disk seek time.  I see that ubuntu already starts
> a "readahead" process before much else, to preload the necessary files
> into RAM.  This is an excellent idea, but those files are still
> potentially located all over the disk, right?  So seek time may be the
> dominant factor in that process.  What if it instead read the same
> files from a cache, in the form of an uncompressed tar file?  Then it
> would be a completely sequential, contiguous read.  And each time a
> file is completely loaded, an event could be fired.  Upstart could use
> those events to start some processes (e.g. when some daemon and all
> its library dependencies have been read, it's OK to start that
> daemon).  That way daemons can be started and files can continue being
> pulled into memory, in parallel (especially on multi-core systems).
> 
> The tar could be rewritten after each boot (as a "nice" process of
> course, and perhaps delayed).  readahead would simply have to notice
> somehow that different files are being read directly from disk rather
> than from the tarball (but it already does this during a "profile"
> operation, to re-write the list of files right?)  Each time the file
> list is different from the order of files in the tarball, it could
> re-write the tar.
> 
> To implement it, some code from "tar" itself needs to be pulled in to
> readahead I think (again, what is the point if tar and some other
> dependencies also need to be read into RAM before readahead can
> proceed?  It's more disk seeks again.)
> 
> Finally readahead could be incorporated into busybox, so that the same
> code is used for regular tar and for readahead, and other startup
> tasks can be accomplished without loading so many files.
> 
> Unless there are any objections or better ideas, I might get around to
> trying the first parts of this.
> 
> Is there another mailing list for readahead?
> 
> Another way to accomplish the seekless loading would be to re-arrange
> the files on disk so that they are contiguous.  Anybody know if that's
> possible?  In a filesystem-independent way?  Usually not much
> attention is paid to "defragging" - the assumption is that the
> filesystem manages this on its own.  But no filesystems yet are trying
> to arrange files to minimize seeks based on the usual read order, are
> they?  And even if the fs did that, startup read order might be
> different than the "usual" read order that would be seen over months
> of uptime.
> 




More information about the upstart-devel mailing list