FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora Development

 
 
LinkBack Thread Tools
 
Old 06-23-2008, 05:22 PM
"Colin Walters"
 
Default apport/breakpad and fedora

Hey,

There was some good discussion at FUDCon about apport and friends.* I'd really like to see us have a sane story on this by F10 - currently Bugzilla has a lot of unannotated backtraces and it's a big waste of developer/triager time.


So the goal is that when a program crashes, we get a full annotated backtrace to the eyes of the developer.* There's some discussion on the current Fedora wiki page:
https://fedoraproject.org/wiki/Releases/FeatureApport


Here's my executive summary:

Apport: Based on kernel core dump piping, can handle any program crash.* Submits data to launchpad which has tracing server which uses debuginfo to create full stack trace.* Fedora would need to make custom server on top of the lowlevel tools.


Breakpad: Used by Firefox and various Google applications.* Based on a shared library linked into processes which catches SIGSEGV and submits a report via HTTP.
Format: http://code.google.com/p/google-breakpad/wiki/ProcessorDesign

One nice thing is that Mozilla has a lot of investment in a free software scalable processing server called "Socorro": http://code.google.com/p/socorro

GNOME is using breakpad/socorro and has a server up:


http://blogs.gnome.org/ovitters/2007/10/06/crash-gnome-org/

They were prototyping it using Fedora, and I believe bug-buddy which is
linked into all GTK+ programs uses breakpad and sends to
crash.gnome.org by default.

Some data points from the discussion:

* Apparently Apport submits an entire core dump which is (IMO) a non-starter for Fedora; there are things like passwords in memory that we just can't send by default.


* Mozilla is unhappy that we're turning off their breakpad integration on Fedora, which means they don't have visibility into crashes on freedesktop.org/Linux systems.


* Possibly replace breakpad library linking by using utrace+system service (right?)

* Create debuginfo DAV server and pull debuginfo data dynamically (what are the download requirements?* does this replace gdb's "debuginfo-install" display?)


My 2 - Link in breakpad, create http://crash.fedoraproject.org running Socorro.* Investigate either submitting reports to Mozilla as well for Firefox or create a system where our Socorro pushes reports to theirs.* Longer term investigate utrace system service instead of having apps link to breakpad (this gets us non-desktop system crashes without having to universally LD_PRELOAD or whatever).




--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
 
Old 06-23-2008, 08:12 PM
Will Woods
 
Default apport/breakpad and fedora

On Mon, 2008-06-23 at 13:22 -0400, Colin Walters wrote:

> Breakpad: Used by Firefox and various Google applications. Based on a
> shared library linked into processes which catches SIGSEGV and submits
> a report via HTTP.
> Format: http://code.google.com/p/google-breakpad/wiki/ProcessorDesign
> One nice thing is that Mozilla has a lot of investment in a free
> software scalable processing server called "Socorro":
> http://code.google.com/p/socorro
>
> GNOME is using breakpad/socorro and has a server up:
> http://blogs.gnome.org/ovitters/2007/10/06/crash-gnome-org/
> They were prototyping it using Fedora, and I believe bug-buddy which
> is linked into all GTK+ programs uses breakpad and sends to
> crash.gnome.org by default.

>
> * Possibly replace breakpad library linking by using utrace+system
> service (right?)

If I remember right, the reason for this part of the discussion was:

1) Linking everything on the system to breakpad is a bit nasty.
2) Apport doesn't need to be linked in, but it runs *after* the process
gets dumped by the kernel. At which point it's slightly different from
when it actually crashed.

pjones' idea was to have a system service that would receive
notification of segfaults and use utrace to stop the process and
generate a (breakpad-style report).

> * Create debuginfo DAV server and pull debuginfo data dynamically
> (what are the download requirements? does this replace gdb's
> "debuginfo-install" display?)

It would make the 'debuginfo-install' message go away, because (if DAV +
FUSE does the right thing) you'll have all the debuginfo you need, in
the right place - mounted as a FUSE filesystem.

It requires some trickery with the web server to keep from *actually*
unpacking all the debuginfo packages in existence (something like ~5T of
data?) but Peter was working on a proof-of-concept at FUDCon.

> My 2 - Link in breakpad, create http://crash.fedoraproject.org
> running Socorro.

Link it into what? Everything, via LD_PRELOAD? Or just GNOME stuff? I
thought bug-buddy already used breakpad?

Oh, and don't forget that we need the debuginfo server (which I think
Peter has been calling "littlebottom" to make these reports useful.

> Investigate either submitting reports to Mozilla as well for Firefox
> or create a system where our Socorro pushes reports to theirs.

The latter seems like the Right Thing, but it depends on the previous
actions. I wonder how caillon would feel about getting Firefox doing
reports to Mozilla in the meantime.

> Longer term investigate utrace system service instead of having apps
> link to breakpad (this gets us non-desktop system crashes without
> having to universally LD_PRELOAD or whatever).

Yeah, I don't think we need to solve this until we've got the
proof-of-concept stack: a couple of choice apps sending Breakpad reports
(with debuginfo fetched from littlebottom) to our own Socorro instance.

-w
--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
 
Old 06-23-2008, 11:08 PM
"Colin Walters"
 
Default apport/breakpad and fedora

2008/6/23 Will Woods <wwoods@redhat.com>:







If I remember right, the reason for this part of the discussion was:



1) Linking everything on the system to breakpad is a bit nasty.

2) Apport doesn't need to be linked in, but it runs *after* the process

gets dumped by the kernel. At which point it's slightly different from

when it actually crashed.
Yeah, sounds right.*


pjones' idea was to have a system service that would receive

notification of segfaults and use utrace to stop the process and

generate a (breakpad-style report).

He was thinking of hooking it into kerneloops, right?*

(Hm, I didn't seem to have that installed on my F9 system which was upgraded from F8...I guess this is a generic problem with new comps packages and upgrades; need to solve that)


Though isn't there a race between when we get the kernel notification and when the service stops it and inspects?* Not my area of expertise really, just thinking out loud.

*It would make the 'debuginfo-install' message go away, because (if DAV +


FUSE does the right thing) you'll have all the debuginfo you need, in

the right place - mounted as a FUSE filesystem.
Ah, ok.




> My 2 - Link in breakpad, create http://crash.fedoraproject.org

> running Socorro.



Link it into what? Everything, via LD_PRELOAD? Or just GNOME stuff? I

thought bug-buddy already used breakpad?
I'm personally most interested in the desktop apps because, well we desktop developers are masochists and code complex user-facing code in C/C++, and not surprisingly they crash =)



So right now...hm, actually this is weird, I can't get any Fedora-compiled program to spawn bug-buddy at all right now.* I get it for some local custom code, but not for anything in /usr/bin.* I see libgnomebreakpad is linked into the process.


I'm out of time for this issue for today, I'll investigate a bit more later.
*


The latter seems like the Right Thing, but it depends on the previous

actions. I wonder how caillon would feel about getting Firefox doing

reports to Mozilla in the meantime.
Yeah, we should probably disable GTK+'s bug-buddy breakpad module for Firefox for now.



> *Longer term investigate utrace system service instead of having apps

> link to breakpad (this gets us non-desktop system crashes without

> having to universally LD_PRELOAD or whatever).



Yeah, I don't think we need to solve this until we've got the

proof-of-concept stack: a couple of choice apps sending Breakpad reports

(with debuginfo fetched from littlebottom) to our own Socorro instance.


Ok cool.* Should we create a feature proposal wiki page for this, or repurpose the old Apport one?



--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
 
Old 06-24-2008, 03:20 PM
Peter Jones
 
Default apport/breakpad and fedora

Colin Walters wrote:

2008/6/23 Will Woods <wwoods@redhat.com>:



If I remember right, the reason for this part of the discussion was:

1) Linking everything on the system to breakpad is a bit nasty.
2) Apport doesn't need to be linked in, but it runs *after* the process
gets dumped by the kernel. At which point it's slightly different from
when it actually crashed.



Yeah, sounds right.

pjones' idea was to have a system service that would receive

notification of segfaults and use utrace to stop the process and
generate a (breakpad-style report).



He was thinking of hooking it into kerneloops, right?


This was really just my "easiest first-pass way to implement it"; I
expect we can replace this part with something better if we need to, and
it may or may not be necessary.



Though isn't there a race between when we get the kernel notification and
when the service stops it and inspects? Not my area of expertise really,
just thinking out loud.


If we're /not/ changing any kernel APIs, we'd want to do several things,
conditional on the feature being enabled. A mostly inclusive list follows:


1) make /var/cache/cores/ a tmpfs mount
2) set kernel.core_pattern to something like "/var/cache/cores/core.%p"
3) do something along the lines of setfacl to limit access
4) "ulimit -c $SOMETHING_NONZERO" for everything.

If we were to change kernel APIs, my initial thought is a utrace plugin
that suspends the task instead of delivering the segfault, and gives us
a notification on a file descriptor we're ppoll()ing on. Then we'd go
examine the process's memory and collect a trace. This also has the
advantage that it means no shared writable space and no spinning up the
disk to write the core out. Also, on the whole it requires fewer
different parts of the system to be set up right.



It would make the 'debuginfo-install' message go away, because (if DAV +
FUSE does the right thing) you'll have all the debuginfo you need, in
the right place - mounted as a FUSE filesystem.


Ah, ok.


FWIW, the debuginfo server I'm working on is at
http://git.fedorahosted.org/git/?p=littlebottom.git;a=summary . It's
still very much in its infancy, and I can use all the help I can get.
I'll gladly add you to the group if you want to help out



My 2 - Link in breakpad, create http://crash.fedoraproject.org
running Socorro.

Link it into what? Everything, via LD_PRELOAD? Or just GNOME stuff? I
thought bug-buddy already used breakpad?


IMNSHO, LD_PRELOAD is just a plain bad idea here (and nearly everywhere
else). There are also plenty of places where we want tracebacks, but
the upstream maintainers won't like the patches, and we don't want to be
carrying patches. Not to mention patching everything is a herculean task.


I really if we're going to succeed, we've got to plan on /not/ changing
most executables.



I'm personally most interested in the desktop apps because, well we desktop
developers are masochists and code complex user-facing code in C/C++, and
not surprisingly they crash =)


The same is true of the rest of the system; I think our solution needs
to work for everything (well, everything compiled, though the
reporting/statistics infrastructure need not be even that specific.)



So right now...hm, actually this is weird, I can't get any Fedora-compiled
program to spawn bug-buddy at all right now. I get it for some local custom
code, but not for anything in /usr/bin. I see libgnomebreakpad is linked
into the process.


Another point against the "link in a magic library" approach. If the
crashing executable has to do the work to spawn the reporting tool,
it'll *never* be reliable.



Longer term investigate utrace system service instead of having apps

link to breakpad (this gets us non-desktop system crashes without
having to universally LD_PRELOAD or whatever).

Yeah, I don't think we need to solve this until we've got the
proof-of-concept stack: a couple of choice apps sending Breakpad reports
(with debuginfo fetched from littlebottom) to our own Socorro instance.


I think we're all in agreement here.

--
Peter

--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
 
Old 06-24-2008, 05:08 PM
Christopher Aillon
 
Default apport/breakpad and fedora

Will Woods wrote:

The latter seems like the Right Thing, but it depends on the previous
actions. I wonder how caillon would feel about getting Firefox doing
reports to Mozilla in the meantime.


Our binaries aren't the exact same as Mozilla's for various reasons:
compiler flags, using system libs instead of in-tree, local patches we
may have, etc. Thus our symbols don't match up 1 to 1 with what their
symbol server expects. Additionally, we have symbols for architectures
they don't. We can likely get them our symbol data, but it needs to be
done for every build we push through the build system. If someone makes
this automatic, yeah sure I have no problem with it at that point.


--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
 
Old 07-09-2008, 01:03 PM
"Colin Walters"
 
Default apport/breakpad and fedora

I got a few moments to create a new Feature page for the breakpad/socorro plan:

https://fedoraproject.org/wiki/Features/CrashHandling

I didn't add any details about the littlebottom/kernel crash handling design since I don't know enough about it.* But if we go for applications running in the desktop as an initial target it would still be very useful.

--
fedora-devel-list mailing list
fedora-devel-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-devel-list
 

Thread Tools




All times are GMT. The time now is 12:45 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org