RFC on Cloud Images: Make /tmp a tmpfs

Wed Jan 20 07:52:04 UTC 2016

Excerpts from Dustin Kirkland's message of 2016-01-19 20:27:51 -0800:
> On Sat, Jan 16, 2016 at 7:49 PM, Clint Byrum <clint at ubuntu.com> wrote:
> > Excerpts from Dustin Kirkland's message of 2016-01-16 04:25:58 -0800:
> >> On Fri, Jan 15, 2016 at 2:25 AM, Seth Arnold <seth.arnold at canonical.com> wrote:
> >> > On Thu, Jan 14, 2016 at 12:27:58PM +0200, Dustin Kirkland wrote:
> >> >> Moreover, just 'sudo apt-get install swapspace' and watch as swapfiles
> >> >> are created/deleted as needed.  If your root disk is lvm-encrypted,
> >> >> then obviously such swap files are encrypted, too.
> >> >
> >> > I've been severely skeptical of the swapspace package:
> >> >
> >> > - Swap is used when the system is already under pressure; a few hundred
> >> >   megs is great and probably for the best but if the system is actively
> >> >   pushing beyond that then it's being pushed too hard.
> >> >
> >> > - If the swap space is going to be allocated on the fly, that means the
> >> >   disk blocks have to zeroed on the fly, when the system is under
> >> >   pressure, rather than at some leisurely time beforehand.
> >> >
> >> > - If the swap space is allocated on a filesystem, it's probably being
> >> >   allocated from a fragmented filesystem that's 90% full rather than a
> >> >   nice contiguous block of space as it would with a swap partition.
> >> >
> >> > - Accessing further into a file may involve loading multiple indirect
> >> >   blocks from disk into unswappable kernel memory. A swap partition does
> >> >   not require indirection blocks.
> >> >
> >> > - If the swap space allocated from a filesystem pushes the filesystem to
> >> >   95% full (or whatever is left after accounting for reserved blocks),
> >> >   programs will error and almost nothing handles "disk full" errors
> >> >   gracefully. Swap partitions do not cause surprise gigabyte losses in
> >> >   free space.
> >> >
> >> > - Swap files can't be allocated from btrfs filesystems and probably
> >> >   shouldn't be allocated from zfs filesystems either. (Swap on zvols,
> >> >   maybe.)
> >> >
> >> > Perhaps the swapspace package uses some tasteful tunables to mitigate
> >> > against my concerns but the end result is that it contributes extra load,
> >> > extra IO pressure, and extra uncertainty at a time when the system is
> >> > already experiencing too much load, too much IO pressure, and too much
> >> > uncertainty.
> >> >
> >> > The risks and downsides of swapspace feel like a lot compared to the
> >> > slight hassle of having the installer make a swap partition.
> >>
> >> I count 4 "if's", 3 "probably's", 2 "should/would's", and 1 "maybe" in
> >> that reply :-)
> >>
> >> Perhaps try it out?
> >>
> >> I've been running it and /tmp on tmpfs for several years (since before
> >> ~precise) on my desktop on an encrypted LVM partition.  My machine has
> >> a lot of memory (16GB), though I do push it hard), and have never
> >> noticed a swapspace-related problem.  I've also used this combination
> >> on hundreds of servers, and several production systems.
> >>
> >
> > The 'if' and 'probably' are missing in your anecdotal evidence though.
> > If you use the servers the way you have, it will probably work fine.
> > Also we're talking about cloud instances, not "servers", which have
> > quite different use and performance profiles.
> >
> > I'd like to see even some rudimentary experiments done with realistic
> > workloads before saying this is a better idea than leaving things as
> > they are. We've all speculated and provided anecdotal evidence enough to
> > warrant such an investigation for those who speculate it will be a
> > worthwhile change.
> 
> Sure, done!  You can find a detailed statistical analysis, as well as
> the raw data for your download and treatment at:
> 
> http://blog.dustinkirkland.com/2016/01/data-driven-analysis-tmp-on-tmpfs.html
> 
> Based on a statistical analysis of 502 physical and virtual servers
> running production and test services at Canonical (including
> databases, websites, OpenStack, ubuntu.com, launchpad.net, et al.),
> 96.6% of them could fit all of the data they currently have in /tmp,
> entirely in half of the free memory available in the system.  That
> ratio goes up to 99.2% of the systems surveyed (i.e., all but 4) when
> one takes into account both free available memory and available swap.
> The remaining 4 systems are are currently using [101 GB, 42 GB, 13 GB,
> and 10 GB] of swap, respectively, and are themselves somewhat special
> cases.
> 

This is a very cool window into actual running servers, and I appreciate
the effort that has gone into this. This shows that most of the server
workloads will work fine with /tmp on tmpfs. I actually would love to
have a program which ran through a set of disk and memory metrics from
something like influxdb and basically told me "these should all be using
tmpfs you dummy!".

However, I do want to point out that I find the sample size very small.
A snapshot of a single point in time from every server means that we
have no qualification given for what time this was, or what the box
was doing at the time. While the standing usage on healthy systems at
a given point in time is a good initial indicator that this experiment
should continue, I'm not sure I would conclude on this basis alone that
it means a change would provide more benefits than challenges for users.
What it means to me is "it's worth digging just a little deeper".

So basically, if I thought the benefits were worth the work, I'd want
a similar collection to be done hourly over a prolonged period that
includes peak usage times for whatever these servers do. I'd probably
also try to simulate common workloads and operations, maybe run all of
the autopkgtests on typical cloud VMs with and without the change to
see what effects could be observed.

But that's all nice-to-haves. The real reason I'm still skeptical
is simply that I don't see /tmp usage being a major factor on cloud
instances.

But, don't take my skepticism as a rejection. I just don't see it as a
priority. I've got zero data that would refute any of this.

> Moreover, Ubuntu is hardly the first Linux/UNIX distribution that has
> considered putting /tmp on tmpfs by default.  Solaris has used a tmpfs
> since 1994.  Fedora moved to /tmp on tmpfs in 2012, as did ArchLinux.
> Things seem to be working okay there...
> 

For Solaris, they have quite a lot of influence on the programs they
ship, and I'm certain they'd have made sure anything using large temp
files would use /var/tmp. I think that Debian and Ubuntu should actually
do that too, and it shouldn't even be that hard.

For the other Linux distros, this is a great point. Has anybody asked the
leaders of Fedora and Arch's communities if their users are experiencing
any undue burden?  If not, then that's certainly a good reason to throw
on the pile.

> As a recap, the benefits of /tmp on tmpfs are:
>  - Performance: reads, writes, and seeks are insanely fast in a tmpfs;
> as fast as accessing RAM (I tested 1.4GB/s writes and 1.1GB/s reads
> to/from tmpfs)

This sounds great on the surface, however, what real thing does this
benefit? Giant git merges? Sort file usage? If somebody is developing
an app that uses /tmp, they're using /tmp because of these reasons:

a) They are processing data in an unpredictable way where using RAM
directly may be too costly.
or
b) They're using a tool that only operates on files on disk.
or
(Let's not go to the place where they're just using /tmp because they
don't know better).

Having tmpfs as /tmp for (a) could provide a performance benefit in
many cases, but opens the system up to swap space exhaustion in extreme
cases. (b) is safer than (a), and also on a very busy system probably
benefits mostly from the lack of inode writes.

None of the data presented actually helps us with the performance
speculation unfortunately. It doesn't detract from the allure of a magic
pill to make most things faster of course. :)

Count me as "+/- 0" at this point.