Consider switching to systemd-coredump for default kernel core dumps

Tue Feb 5 23:14:58 UTC 2019

>
> Except for the part where a coredump for an unknown binary is useless.

If I have a service that crashes in a container. I'd like to get back into
the container and inspect why. Simply throwing out unknown binary crashes
doesn't exactly seem like a stellar decision to me. And unknown binary to
whom?. It may not be known in the ubuntu/debian repos - that doesn't mean
the devs don't have debug symbols, and reproducible binary builds which
they can corelate with in post-mortem circumstances with or without the
actual environment. In fact, I'd presume that's true in 100% of container
usages. So, I'm not entirely sure what you're talking about in terms of a
small user base.

That's why our current setup is to forward the crash to the containers
> when the containers are detected to know how to handle coredumps and
> otherwise throw them out as they're not going to be of any use.

This raises all the alarms for me, and is the biggest problem I have with
apport as the default. Debian simply dumps to a 'core' file.
RHEL/Fedora/CentOS does either that, or handles it with systemd-coredump
which ends up with a dump in "/var/lib/systemd/coredump". Apport cannot
auto report. But throwing out the whole dump is just plain wrong to me. By
what I'm proposing everyone's happy. Any of the potential solutions I
state, leaves the dump intact, and apport can just sit on top of it, and
analyse the files and throw dumps that it handles away, or even just become
a pure reporter instead of handling the dumps on its own.

On Wed, Feb 6, 2019 at 4:02 AM Stéphane Graber <stgraber at ubuntu.com> wrote:

> On Wed, Feb 06, 2019 at 03:35:23AM +0530, Prasanna V. Loganathar wrote:
> > >
> > > You need to have apport itself installed in the container, I suspect
> > > that the Docker containers do not have it.
> >
> >
> > This will make no difference. Doing an `apt update && apt install -y
> > apport` will not do any good, as apport is set to disable itself on
> > containers.
>
> And indeed Docker doesn't even have an init system running so that'd be
> pretty useless as the apport forwarding service wouldn't work.
>
> > > Having the dump handled by apport on the host would be useless as the
> > > host doesn't know what binary was used to produce the dump (and so
> can't
> > > be used with retracers) nor can it capture any of the crash environment
> > > information as the relevant data is in the container, not in the host.
> > >
> >
> > Precisely my point. But this is not a problem with systemd-coredump, as
> it
> > simply just manages the dumps. Doesn't need retracing. All
> > RHEL/Fedora/CentOS based distros work without any modifications, as
> > expected. With containers everywhere, I think it's reconsidering the
> > default and unifying things around system-coredump effort simplifies
> things
> > for everyone. coredumpctl is also much nicer to work with. apport should
> > easily be able to just watch over the dumps and do it's reporting on
> > desktop systems from there.
>
> Except for the part where a coredump for an unknown binary is useless.
>
> So having all the dumps centralized on the host when you don't know what
> container it came from and may no longer have any of the environment
> information is completely useless.
>
> That's why our current setup is to forward the crash to the containers
> when the containers are detected to know how to handle coredumps and
> otherwise throw them out as they're not going to be of any use.
>
>
> For people who are actively debugging a container that's getting crashes
> and know what binary to use with the dumped core file, it's not a huge
> deal to change the core dump handler (stopping the apport service will
> even do that for you).
>
> Those people make for a fraction of a percent of Ubuntu's userbase,
> everyone else strongly benefits from crash reports being handed over to
> apport where extra data is captured and the resulting crashes can be
> send to errors.ubuntu.com for retracing and aggregation.
>
>
> > On Wed, Feb 6, 2019 at 3:22 AM Stéphane Graber <stgraber at ubuntu.com>
> wrote:
> >
> > > On Wed, Feb 06, 2019 at 03:05:20AM +0530, Prasanna V. Loganathar wrote:
> > > > Hi Stephane,
> > > >
> > > > Ah. I had overlooked this one. It does work well in lxc. Thanks for
> > > > pointing that out.
> > > > However, apport's default is to do nothing in containers.
> > > >
> > > > docker run --name testbox --rm -it ubuntu bash
> > > > > sleep 10 & kill -ABRT $(pgrep sleep)
> > > >
> > > > This has no /var/crash directory. There are no dumps. Doesn't help
> > > > with "--ulimit core=x:x" to docker option either.
> > >
> > > You need to have apport itself installed in the container, I suspect
> > > that the Docker containers do not have it.
> > >
> > > Having the dump handled by apport on the host would be useless as the
> > > host doesn't know what binary was used to produce the dump (and so
> can't
> > > be used with retracers) nor can it capture any of the crash environment
> > > information as the relevant data is in the container, not in the host.
> > >
> > > >
> > > > On Tue, Feb 5, 2019 at 10:32 PM Stéphane Graber <stgraber at ubuntu.com
> >
> > > wrote:
> > > > >
> > > > > On Tue, Feb 05, 2019 at 06:36:48AM +0530, Prasanna V. Loganathar
> wrote:
> > > > > > Hi Stephane,
> > > > > >
> > > > > > Thanks for the reply. Please correct me if I'm wrong.
> > > > > >
> > > > > > lxc launch ubuntu:18:04 testbox
> > > > > > lxc exec testbox bash
> > > > > > sleep 100 & kill -ABRT $(pgrep sleep)
> > > > >
> > > > > stgraber at castiana:~$ lxc launch ubuntu:18.04 testbox
> > > > > Creating testbox
> > > > > Starting testbox
> > > > > stgraber at castiana:~$ lxc exec testbox bash
> > > > > root at testbox:~# sleep 10 &
> > > > > [1] 303
> > > > > root at testbox:~# kill -ABRT 303
> > > > > root at testbox:~#
> > > > > [1]+  Aborted                 (core dumped) sleep 10
> > > > > root at testbox:~# ls -lh /var/crash
> > > > > total 512
> > > > > -rw-r----- 1 root root 34K Feb  5 17:02 _bin_sleep.0.crash
> > > > > root at testbox:~#
> > > > >
> > > > > > There's no crash dump anywhere.
> > > > > >
> > > > > > /var/crash is empty. No `core` file, etc. The default is
> > > > > > crashes just vaporises itself - that's my biggest concern.
> While, for
> > > > > > instance on a debian - you can rely on a 'core' file, or on
> > > > > > CentOS/RHEL, you can always rely on a systemd-coredump
> > > infrastructure,
> > > > > > and addressed so nicely by the coredumpctl command.
> > > > > >
> > > > > > With systemd being the init, and coredumpctl being very well
> > > > > > architectured to manage this, I just don't see why Ubuntu
> shouldn't
> > > > > > make use of that and have apport only do what it does best -
> which is
> > > > > > primarily reporting.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 5, 2019 at 1:57 AM Stéphane Graber <
> stgraber at ubuntu.com>
> > > wrote:
> > > > > > >
> > > > > > > On Sat, Feb 02, 2019 at 03:34:22AM +0530, Prasanna V.
> Loganathar
> > > wrote:
> > > > > > > > Hi folks,
> > > > > > > >
> > > > > > > > Currently, `$ sysctl kernel.core_pattern` gives
> > > > > > > > `kernel.core_pattern = |/usr/share/apport/apport %p %s %c %d
> %P`
> > > > > > > >
> > > > > > > > This is usually fine, however, when run from containers or
> lxc
> > > this
> > > > > > > > will just error out, with no core dump being produced, due
> to it
> > > being
> > > > > > > > disabled. Adding to the problem: with container or lxc, you
> > > can't just
> > > > > > > > change it per container, and should be changed on the host.
> And
> > > it's
> > > > > > > > not reasonable to expect apport in all containers either.
> Since,
> > > this
> > > > > > > > is a common problem, I think it's important to consider a
> change
> > > in
> > > > > > > > the default handling.
> > > > > > > >
> > > > > > > > There are multiple options to deal with this:
> > > > > > > >
> > > > > > > > 1. Drop apport as default core_pattern handler and switch to
> a
> > > simple
> > > > > > > > file in either /var/crash (this requires creating /var/crash
> as
> > > a part
> > > > > > > > of the installation as it's currently created by apport), or
> > > /tmp and
> > > > > > > > let apport handle the core dump after the fact, and report
> it.
> > > > > > > >
> > > > > > > > 2. Switch to systemd-coredump, and default to it, since it
> > > already
> > > > > > > > does this very well and provides "coredumpctl" which is much
> > > nicer to
> > > > > > > > work with. systemd-coredump also is a part of the systemd
> suite
> > > of
> > > > > > > > utils and doesn't pull in a larger dependency as apport --
> which
> > > to
> > > > > > > > date, isn't as robust (I still have "core" files being left
> all
> > > over
> > > > > > > > the place due to the fact that apport reset's itself and
> crashes
> > > > > > > > during startup aren't handled properly). This also has a nice
> > > > > > > > advantage of unifying the OSS community in terms of coredump
> > > handler.
> > > > > > > > apport can handle things from the core dumps that systemd
> > > generates,
> > > > > > > > further on desktops.
> > > > > > > >
> > > > > > > > 3. Employ a tiny helper script, as the default core dump
> handler,
> > > > > > > > which looks for specified programs such as "apport", "abrt",
> > > > > > > > systemd-coredump" and pipes to them, or pipes it to
> /var/crash,
> > > or
> > > > > > > > /tmp during it's absence. This does have the disadvantage of
> > > growing
> > > > > > > > with it's own config, rather quickly.
> > > > > > > >
> > > > > > > > That being said, I highly suggest option 2 be used in the
> > > upcoming
> > > > > > > > versions, and apport be a layer sitting on top of the
> coredumps
> > > > > > > > generated by systemd-coredumps by either hooking into it, or
> by
> > > > > > > > watching it's folders.
> > > > > > > >
> > > > > > > > I've also filed this as a bug here:
> > > > > > > > https://bugs.launchpad.net/ubuntu/+bug/1813403
> > > > > > >
> > > > > > > Are you aware that apport is aware of containers (LXC & LXD)
> and
> > > will
> > > > > > > just forward your crash to apport inside the container?
> > > > > > >
> > > > > > > That actually tends to be a better way to handle things that
> > > > > > > centralizing all crashes on the host as monitoring tools
> running
> > > inside
> > > > > > > the containers will operate as normal, reporting on the
> presence
> > > of a
> > > > > > > crash file, as well, sending the bug report to Launchpad will
> then
> > > work
> > > > > > > properly, capturing the details from the container rather than
> > > > > > > confusingly mixing package details of the host with a crash
> that
> > > may
> > > > > > > have come from a completely different release.
> > > > > > >
> > > > > > > There's obviously nothing wrong with someone opting out of
> apport
> > > and
> > > > > > > switching to whatever core dump handler they want, but for
> Ubuntu,
> > > > > > > apport has much better integration with the distribution, has
> > > hooks in
> > > > > > > many packages and was designed with proper container support.
> > > > > > >
> > > > > > > --
> > > > > > > Stéphane Graber
> > > > > > > Ubuntu developer
> > > > > > > http://www.ubuntu.com
> > > > >
> > > > > --
> > > > > Stéphane Graber
> > > > > Ubuntu developer
> > > > > http://www.ubuntu.com
> > >
> > > --
> > > Stéphane Graber
> > > Ubuntu developer
> > > http://www.ubuntu.com
> > >
>
> --
> Stéphane Graber
> Ubuntu developer
> http://www.ubuntu.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-devel-discuss/attachments/20190206/fcea9d21/attachment.html>