Colin Walters wrote:
2008/6/23 Will Woods <firstname.lastname@example.org>:
If I remember right, the reason for this part of the discussion was:
1) Linking everything on the system to breakpad is a bit nasty.
2) Apport doesn't need to be linked in, but it runs *after* the process
gets dumped by the kernel. At which point it's slightly different from
when it actually crashed.
Yeah, sounds right.
pjones' idea was to have a system service that would receive
notification of segfaults and use utrace to stop the process and
generate a (breakpad-style report).
He was thinking of hooking it into kerneloops, right?
This was really just my "easiest first-pass way to implement it"; I
expect we can replace this part with something better if we need to, and
it may or may not be necessary.
Though isn't there a race between when we get the kernel notification and
when the service stops it and inspects? Not my area of expertise really,
just thinking out loud.
If we're /not/ changing any kernel APIs, we'd want to do several things,
conditional on the feature being enabled. A mostly inclusive list follows:
1) make /var/cache/cores/ a tmpfs mount
2) set kernel.core_pattern to something like "/var/cache/cores/core.%p"
3) do something along the lines of setfacl to limit access
4) "ulimit -c $SOMETHING_NONZERO" for everything.
If we were to change kernel APIs, my initial thought is a utrace plugin
that suspends the task instead of delivering the segfault, and gives us
a notification on a file descriptor we're ppoll()ing on. Then we'd go
examine the process's memory and collect a trace. This also has the
advantage that it means no shared writable space and no spinning up the
disk to write the core out. Also, on the whole it requires fewer
different parts of the system to be set up right.
It would make the 'debuginfo-install' message go away, because (if DAV +
FUSE does the right thing) you'll have all the debuginfo you need, in
the right place - mounted as a FUSE filesystem.
FWIW, the debuginfo server I'm working on is at
http://git.fedorahosted.org/git/?p=littlebottom.git;a=summary . It's
still very much in its infancy, and I can use all the help I can get.
I'll gladly add you to the group if you want to help out
My 2¢ - Link in breakpad, create http://crash.fedoraproject.org
Link it into what? Everything, via LD_PRELOAD? Or just GNOME stuff? I
thought bug-buddy already used breakpad?
IMNSHO, LD_PRELOAD is just a plain bad idea here (and nearly everywhere
else). There are also plenty of places where we want tracebacks, but
the upstream maintainers won't like the patches, and we don't want to be
carrying patches. Not to mention patching everything is a herculean task.
I really if we're going to succeed, we've got to plan on /not/ changing
I'm personally most interested in the desktop apps because, well we desktop
developers are masochists and code complex user-facing code in C/C++, and
not surprisingly they crash =)
The same is true of the rest of the system; I think our solution needs
to work for everything (well, everything compiled, though the
reporting/statistics infrastructure need not be even that specific.)
So right now...hm, actually this is weird, I can't get any Fedora-compiled
program to spawn bug-buddy at all right now. I get it for some local custom
code, but not for anything in /usr/bin. I see libgnomebreakpad is linked
into the process.
Another point against the "link in a magic library" approach. If the
crashing executable has to do the work to spawn the reporting tool,
it'll *never* be reliable.
Longer term investigate utrace system service instead of having apps
link to breakpad (this gets us non-desktop system crashes without
having to universally LD_PRELOAD or whatever).
Yeah, I don't think we need to solve this until we've got the
proof-of-concept stack: a couple of choice apps sending Breakpad reports
(with debuginfo fetched from littlebottom) to our own Socorro instance.
I think we're all in agreement here.
fedora-devel-list mailing list