Consider switching to systemd-coredump for default kernel core dumps

Prasanna V. Loganathar pvl at prasannavl.com
Tue Feb 5 23:27:32 UTC 2019


>
> So having all the dumps centralized on the host when you don't know what
> container it came from and may no longer have any of the environment
> information is completely useless.
>

Adding on to my previous reply, perhaps there's a misunderstanding on this
one? Who's forwarding dumps to the host to centralize them? I don't
understand what you mean here. If the default is either a plain file, or
systemd-coredump (with it being installed inside the container), the
container can handle it on it's own leaving the dump file inside the
crashed container. Today, with ubuntu's default configuration and apport,
it's just gone -- Post-mortem? Well, too bad, apport inside the container
just burned the corpse - Good luck!.. whereas non-ubuntu distros mostly
leave it intact.

Places where forwarding dumps to a hosts take place, I'd assume it's a
sophisticated enough infrastructure where the core dump handler is
overridden in the first place.

On Wed, Feb 6, 2019 at 4:44 AM Prasanna V. Loganathar <pvl at prasannavl.com>
wrote:

> Except for the part where a coredump for an unknown binary is useless.
>
>
> If I have a service that crashes in a container. I'd like to get back into
> the container and inspect why. Simply throwing out unknown binary crashes
> doesn't exactly seem like a stellar decision to me. And unknown binary to
> whom?. It may not be known in the ubuntu/debian repos - that doesn't mean
> the devs don't have debug symbols, and reproducible binary builds which
> they can corelate with in post-mortem circumstances with or without the
> actual environment. In fact, I'd presume that's true in 100% of container
> usages. So, I'm not entirely sure what you're talking about in terms of a
> small user base.
>
> That's why our current setup is to forward the crash to the containers
>> when the containers are detected to know how to handle coredumps and
>> otherwise throw them out as they're not going to be of any use.
>
>
> This raises all the alarms for me, and is the biggest problem I have with
> apport as the default. Debian simply dumps to a 'core' file.
> RHEL/Fedora/CentOS does either that, or handles it with systemd-coredump
> which ends up with a dump in "/var/lib/systemd/coredump". Apport cannot
> auto report. But throwing out the whole dump is just plain wrong to me. By
> what I'm proposing everyone's happy. Any of the potential solutions I
> state, leaves the dump intact, and apport can just sit on top of it, and
> analyse the files and throw dumps that it handles away, or even just become
> a pure reporter instead of handling the dumps on its own.
>
>
> On Wed, Feb 6, 2019 at 4:02 AM Stéphane Graber <stgraber at ubuntu.com>
> wrote:
>
>> On Wed, Feb 06, 2019 at 03:35:23AM +0530, Prasanna V. Loganathar wrote:
>> > >
>> > > You need to have apport itself installed in the container, I suspect
>> > > that the Docker containers do not have it.
>> >
>> >
>> > This will make no difference. Doing an `apt update && apt install -y
>> > apport` will not do any good, as apport is set to disable itself on
>> > containers.
>>
>> And indeed Docker doesn't even have an init system running so that'd be
>> pretty useless as the apport forwarding service wouldn't work.
>>
>> > > Having the dump handled by apport on the host would be useless as the
>> > > host doesn't know what binary was used to produce the dump (and so
>> can't
>> > > be used with retracers) nor can it capture any of the crash
>> environment
>> > > information as the relevant data is in the container, not in the host.
>> > >
>> >
>> > Precisely my point. But this is not a problem with systemd-coredump, as
>> it
>> > simply just manages the dumps. Doesn't need retracing. All
>> > RHEL/Fedora/CentOS based distros work without any modifications, as
>> > expected. With containers everywhere, I think it's reconsidering the
>> > default and unifying things around system-coredump effort simplifies
>> things
>> > for everyone. coredumpctl is also much nicer to work with. apport should
>> > easily be able to just watch over the dumps and do it's reporting on
>> > desktop systems from there.
>>
>> Except for the part where a coredump for an unknown binary is useless.
>>
>> So having all the dumps centralized on the host when you don't know what
>> container it came from and may no longer have any of the environment
>> information is completely useless.
>>
>> That's why our current setup is to forward the crash to the containers
>> when the containers are detected to know how to handle coredumps and
>> otherwise throw them out as they're not going to be of any use.
>>
>>
>> For people who are actively debugging a container that's getting crashes
>> and know what binary to use with the dumped core file, it's not a huge
>> deal to change the core dump handler (stopping the apport service will
>> even do that for you).
>>
>> Those people make for a fraction of a percent of Ubuntu's userbase,
>> everyone else strongly benefits from crash reports being handed over to
>> apport where extra data is captured and the resulting crashes can be
>> send to errors.ubuntu.com for retracing and aggregation.
>>
>>
>> > On Wed, Feb 6, 2019 at 3:22 AM Stéphane Graber <stgraber at ubuntu.com>
>> wrote:
>> >
>> > > On Wed, Feb 06, 2019 at 03:05:20AM +0530, Prasanna V. Loganathar
>> wrote:
>> > > > Hi Stephane,
>> > > >
>> > > > Ah. I had overlooked this one. It does work well in lxc. Thanks for
>> > > > pointing that out.
>> > > > However, apport's default is to do nothing in containers.
>> > > >
>> > > > docker run --name testbox --rm -it ubuntu bash
>> > > > > sleep 10 & kill -ABRT $(pgrep sleep)
>> > > >
>> > > > This has no /var/crash directory. There are no dumps. Doesn't help
>> > > > with "--ulimit core=x:x" to docker option either.
>> > >
>> > > You need to have apport itself installed in the container, I suspect
>> > > that the Docker containers do not have it.
>> > >
>> > > Having the dump handled by apport on the host would be useless as the
>> > > host doesn't know what binary was used to produce the dump (and so
>> can't
>> > > be used with retracers) nor can it capture any of the crash
>> environment
>> > > information as the relevant data is in the container, not in the host.
>> > >
>> > > >
>> > > > On Tue, Feb 5, 2019 at 10:32 PM Stéphane Graber <
>> stgraber at ubuntu.com>
>> > > wrote:
>> > > > >
>> > > > > On Tue, Feb 05, 2019 at 06:36:48AM +0530, Prasanna V. Loganathar
>> wrote:
>> > > > > > Hi Stephane,
>> > > > > >
>> > > > > > Thanks for the reply. Please correct me if I'm wrong.
>> > > > > >
>> > > > > > lxc launch ubuntu:18:04 testbox
>> > > > > > lxc exec testbox bash
>> > > > > > sleep 100 & kill -ABRT $(pgrep sleep)
>> > > > >
>> > > > > stgraber at castiana:~$ lxc launch ubuntu:18.04 testbox
>> > > > > Creating testbox
>> > > > > Starting testbox
>> > > > > stgraber at castiana:~$ lxc exec testbox bash
>> > > > > root at testbox:~# sleep 10 &
>> > > > > [1] 303
>> > > > > root at testbox:~# kill -ABRT 303
>> > > > > root at testbox:~#
>> > > > > [1]+  Aborted                 (core dumped) sleep 10
>> > > > > root at testbox:~# ls -lh /var/crash
>> > > > > total 512
>> > > > > -rw-r----- 1 root root 34K Feb  5 17:02 _bin_sleep.0.crash
>> > > > > root at testbox:~#
>> > > > >
>> > > > > > There's no crash dump anywhere.
>> > > > > >
>> > > > > > /var/crash is empty. No `core` file, etc. The default is
>> > > > > > crashes just vaporises itself - that's my biggest concern.
>> While, for
>> > > > > > instance on a debian - you can rely on a 'core' file, or on
>> > > > > > CentOS/RHEL, you can always rely on a systemd-coredump
>> > > infrastructure,
>> > > > > > and addressed so nicely by the coredumpctl command.
>> > > > > >
>> > > > > > With systemd being the init, and coredumpctl being very well
>> > > > > > architectured to manage this, I just don't see why Ubuntu
>> shouldn't
>> > > > > > make use of that and have apport only do what it does best -
>> which is
>> > > > > > primarily reporting.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Feb 5, 2019 at 1:57 AM Stéphane Graber <
>> stgraber at ubuntu.com>
>> > > wrote:
>> > > > > > >
>> > > > > > > On Sat, Feb 02, 2019 at 03:34:22AM +0530, Prasanna V.
>> Loganathar
>> > > wrote:
>> > > > > > > > Hi folks,
>> > > > > > > >
>> > > > > > > > Currently, `$ sysctl kernel.core_pattern` gives
>> > > > > > > > `kernel.core_pattern = |/usr/share/apport/apport %p %s %c
>> %d %P`
>> > > > > > > >
>> > > > > > > > This is usually fine, however, when run from containers or
>> lxc
>> > > this
>> > > > > > > > will just error out, with no core dump being produced, due
>> to it
>> > > being
>> > > > > > > > disabled. Adding to the problem: with container or lxc, you
>> > > can't just
>> > > > > > > > change it per container, and should be changed on the host.
>> And
>> > > it's
>> > > > > > > > not reasonable to expect apport in all containers either.
>> Since,
>> > > this
>> > > > > > > > is a common problem, I think it's important to consider a
>> change
>> > > in
>> > > > > > > > the default handling.
>> > > > > > > >
>> > > > > > > > There are multiple options to deal with this:
>> > > > > > > >
>> > > > > > > > 1. Drop apport as default core_pattern handler and switch
>> to a
>> > > simple
>> > > > > > > > file in either /var/crash (this requires creating
>> /var/crash as
>> > > a part
>> > > > > > > > of the installation as it's currently created by apport), or
>> > > /tmp and
>> > > > > > > > let apport handle the core dump after the fact, and report
>> it.
>> > > > > > > >
>> > > > > > > > 2. Switch to systemd-coredump, and default to it, since it
>> > > already
>> > > > > > > > does this very well and provides "coredumpctl" which is much
>> > > nicer to
>> > > > > > > > work with. systemd-coredump also is a part of the systemd
>> suite
>> > > of
>> > > > > > > > utils and doesn't pull in a larger dependency as apport --
>> which
>> > > to
>> > > > > > > > date, isn't as robust (I still have "core" files being left
>> all
>> > > over
>> > > > > > > > the place due to the fact that apport reset's itself and
>> crashes
>> > > > > > > > during startup aren't handled properly). This also has a
>> nice
>> > > > > > > > advantage of unifying the OSS community in terms of coredump
>> > > handler.
>> > > > > > > > apport can handle things from the core dumps that systemd
>> > > generates,
>> > > > > > > > further on desktops.
>> > > > > > > >
>> > > > > > > > 3. Employ a tiny helper script, as the default core dump
>> handler,
>> > > > > > > > which looks for specified programs such as "apport", "abrt",
>> > > > > > > > systemd-coredump" and pipes to them, or pipes it to
>> /var/crash,
>> > > or
>> > > > > > > > /tmp during it's absence. This does have the disadvantage of
>> > > growing
>> > > > > > > > with it's own config, rather quickly.
>> > > > > > > >
>> > > > > > > > That being said, I highly suggest option 2 be used in the
>> > > upcoming
>> > > > > > > > versions, and apport be a layer sitting on top of the
>> coredumps
>> > > > > > > > generated by systemd-coredumps by either hooking into it,
>> or by
>> > > > > > > > watching it's folders.
>> > > > > > > >
>> > > > > > > > I've also filed this as a bug here:
>> > > > > > > > https://bugs.launchpad.net/ubuntu/+bug/1813403
>> > > > > > >
>> > > > > > > Are you aware that apport is aware of containers (LXC & LXD)
>> and
>> > > will
>> > > > > > > just forward your crash to apport inside the container?
>> > > > > > >
>> > > > > > > That actually tends to be a better way to handle things that
>> > > > > > > centralizing all crashes on the host as monitoring tools
>> running
>> > > inside
>> > > > > > > the containers will operate as normal, reporting on the
>> presence
>> > > of a
>> > > > > > > crash file, as well, sending the bug report to Launchpad will
>> then
>> > > work
>> > > > > > > properly, capturing the details from the container rather than
>> > > > > > > confusingly mixing package details of the host with a crash
>> that
>> > > may
>> > > > > > > have come from a completely different release.
>> > > > > > >
>> > > > > > > There's obviously nothing wrong with someone opting out of
>> apport
>> > > and
>> > > > > > > switching to whatever core dump handler they want, but for
>> Ubuntu,
>> > > > > > > apport has much better integration with the distribution, has
>> > > hooks in
>> > > > > > > many packages and was designed with proper container support.
>> > > > > > >
>> > > > > > > --
>> > > > > > > Stéphane Graber
>> > > > > > > Ubuntu developer
>> > > > > > > http://www.ubuntu.com
>> > > > >
>> > > > > --
>> > > > > Stéphane Graber
>> > > > > Ubuntu developer
>> > > > > http://www.ubuntu.com
>> > >
>> > > --
>> > > Stéphane Graber
>> > > Ubuntu developer
>> > > http://www.ubuntu.com
>> > >
>>
>> --
>> Stéphane Graber
>> Ubuntu developer
>> http://www.ubuntu.com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-devel-discuss/attachments/20190206/5ef4fe07/attachment.html>


More information about the Ubuntu-devel-discuss mailing list