FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora Development

 
 
LinkBack Thread Tools
 
Old 11-03-2010, 05:48 PM
Owen Taylor
 
Default Compile with -fno-omit-frame-pointer on x86_64?

Lack of decent profiling is a major problem for making our operating
system fast. By far the most effective of profiling is sampling profile
with callgraph information.

Soeren's comment from March:

http://lwn.net/Articles/380582/

Basically summarizes the situation, and as far as I know nothing has
changed ... with default compilation options, getting callgraph
profiling on x86_64 really requires a DWARF unwinder in the kernel.
Which seems unlikely to happen.

As a developer, your options for profiling are:

- Recompile everything you care about profiling
with -fno-omit-frame-pointer instead of using system packages.

- Switch to i386

Even if the second was reasonable to ask of developers, it also makes it
really hard to help users with performance problems if they have to
reinstall their system to give you a profile.

So, I'd like to bring up the possibility of switching to compiling our
packages with -fno-omit-frame-pointer for x86_64. As Soeren says, x86_64
isn't register starved, so the performance penalty shouldn't be huge.
But I have no idea if it's 0.5% or 5%.

What aspects of performance do we care about?
How would we measure the performance impact of changing
compilation flags?
What is the acceptable slowdown?

- Owen

(One downside of any slowdown is that if we take a 1% hit and do
performance work that makes the system 10% faster, then we look bad by
comparison with other Linux distributions who get the advantages of the
performance work but don't take the 1% hit. Still, we should do what has
the biggest net gain for our users, right?)


--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
 
Old 11-03-2010, 05:58 PM
Jakub Jelinek
 
Default Compile with -fno-omit-frame-pointer on x86_64?

On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
> Lack of decent profiling is a major problem for making our operating
> system fast. By far the most effective of profiling is sampling profile
> with callgraph information.
>
> Soeren's comment from March:
>
> http://lwn.net/Articles/380582/
>
> Basically summarizes the situation, and as far as I know nothing has
> changed ... with default compilation options, getting callgraph
> profiling on x86_64 really requires a DWARF unwinder in the kernel.
> Which seems unlikely to happen.

But that's the right thing to do.

> As a developer, your options for profiling are:
>
> - Recompile everything you care about profiling
> with -fno-omit-frame-pointer instead of using system packages.

Instead of this, which really is a big performance penalty. Even i?86 is
changing in GCC 4.6 to not do -fno-omit-frame-pointer by default.
The unwind info recent GCCs provide is correct even in epilogues and can be
relied upon. There are several lightweight unwinders that can be easily
adapted for kernel purposes. Just talk to the systemtap folks.

There is always callgrind if you don't want to recompile anything and
need to profile something even when kernel doesn't support it.

Jakub
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
 
Old 11-03-2010, 06:20 PM
Owen Taylor
 
Default Compile with -fno-omit-frame-pointer on x86_64?

On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
> On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
> > Lack of decent profiling is a major problem for making our operating
> > system fast. By far the most effective of profiling is sampling profile
> > with callgraph information.
> >
> > Soeren's comment from March:
> >
> > http://lwn.net/Articles/380582/
> >
> > Basically summarizes the situation, and as far as I know nothing has
> > changed ... with default compilation options, getting callgraph
> > profiling on x86_64 really requires a DWARF unwinder in the kernel.
> > Which seems unlikely to happen.
>
> But that's the right thing to do.
>
> > As a developer, your options for profiling are:
> >
> > - Recompile everything you care about profiling
> > with -fno-omit-frame-pointer instead of using system packages.
>
> Instead of this, which really is a big performance penalty.

Do you have a sense of the quantification of "big" here? I know in
compiler terms, 1% is big, but we're no where close to wringing the last
1% out of overall Fedora performance. If you create a sufficiently
complex system, there's lots of "stupid" stuff going on. And you can't
find the stupid stuff without appropriate tools.

> Even i?86 is
> changing in GCC 4.6 to not do -fno-omit-frame-pointer by default.
> The unwind info recent GCCs provide is correct even in epilogues and can be
> relied upon. There are several lightweight unwinders that can be easily
> adapted for kernel purposes. Just talk to the systemtap folks.

It seems like if it was that easy, it would have happened and we'd have
a solution in the upstream kernel...

(One thing that definitely makes things tricky is paging in debuginfo. I
think I saw a discussion somewhere that systemtap preemptively was
paging in all debuginfo for traced modules. That's tricky in systemwide
profiling situations, but maybe you could have something where you do
one run, load the debuginfo for everything that was hit in the first
run, then do a second run.)

> There is always callgrind if you don't want to recompile anything and
> need to profile something even when kernel doesn't support it.

callgrind is reasonable if you a single program that is slow and where
the slowness is pretty much straightup CPU.

But we're seldom trying to profile "a program" - we are trying to
profile system situations that involve several programs and the kernel.

And programs are frequently not straight-up bound on things that
valgrind can easily model. For example, if our program is reading from
uncached graphics memory somewhere, that won't show up at all in
callgrind - to callgrind, it's just memory reads. But it may dominate a
more accurate sampled profile.

Plus the performance hit of callgrind makes it not very useful for
real-time interactive user interface.

- Owen


--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
 
Old 11-03-2010, 06:29 PM
Jakub Jelinek
 
Default Compile with -fno-omit-frame-pointer on x86_64?

On Wed, Nov 03, 2010 at 03:20:59PM -0400, Owen Taylor wrote:
> On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
> > On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:

> > Instead of this, which really is a big performance penalty.
>
> Do you have a sense of the quantification of "big" here? I know in
> compiler terms, 1% is big, but we're no where close to wringing the last
> 1% out of overall Fedora performance. If you create a sufficiently
> complex system, there's lots of "stupid" stuff going on. And you can't
> find the stupid stuff without appropriate tools.

The last numbers I was pointed at for x86_64 were 4% slowdown, which
really is a lot and it takes several years to achieve that improvement on the
compiler side.

> It seems like if it was that easy, it would have happened and we'd have
> a solution in the upstream kernel...

I think we had one in the upstream kernel for some time, then Linus just
didn't like to see it needing too many bugfixes needed for it and nuked it.

> (One thing that definitely makes things tricky is paging in debuginfo. I
> think I saw a discussion somewhere that systemtap preemptively was
> paging in all debuginfo for traced modules. That's tricky in systemwide

Yeah, systemtap does that (and has that in kernel unwinder for userspace).

Jakub
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
 
Old 11-03-2010, 06:56 PM
John Reiser
 
Default Compile with -fno-omit-frame-pointer on x86_64?

On 11/03/2010 11:48 AM, Owen Taylor wrote:
> Lack of decent profiling is a major problem for making our operating
> system fast. By far the most effective of profiling is sampling profile
> with callgraph information.

I am the author of tsprof, http://bitwagon.com/tsprof/tsprof.html .
Eight years ago that app provided everything you desire, and with
no compilation flags necessary: not -pg, not -p. [The implementation
is equivalent to "infecting the memory image of the application with
a profiling virus" and it was at process entry in just a couple
seconds.] But nobody would pay for it on i686, so the product
was abandoned despite a working prototype for x86_64.

A few years before that, there was TracePoint Technology, a startup
funded by venture capital that offered nifty profiling tools:
http://venturebeatprofiles.com/company/profile/tracepoint-technology
Soon they were acquired by Digital Equipment Corp and died with DEC.

Over several years, dueling proposals (perfctr, perfmon, perfmon2)
failed to get into the Linux kernel. Then the CPU and motherboard
designers made the underlying hardware counter (RDTSC) unreliable
in too many cases (non-constant frequency, not synchronized for SMP,
arbitrarily scribbled by SystemManagementMode, ...).

Today the infrastructure work for kernel ftrace comes close to what
is required for use by apps, but gcc still won't do exactly the
right thing.

In short, those who want profiling have failed repeatedly to present
an _effective_ case.

What are you doing to do differently this time?
[The workaround is to spend a week learning how to run oprofile
and interpret its output.]

--
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
 
Old 11-03-2010, 07:10 PM
Adam Jackson
 
Default Compile with -fno-omit-frame-pointer on x86_64?

On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
> On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
> > Basically summarizes the situation, and as far as I know nothing has
> > changed ... with default compilation options, getting callgraph
> > profiling on x86_64 really requires a DWARF unwinder in the kernel.
> > Which seems unlikely to happen.
>
> But that's the right thing to do.

Sure, but so is a kernel debugger, and it's taken us over ten years to
get one. I'm pretty okay with doing something wrong now if it gets me
something usable for long enough to get something right later. I'll
take 4% across the board if it helps me find the 20% that matters.

> There is always callgrind if you don't want to recompile anything and
> need to profile something even when kernel doesn't support it.

I don't want to know how callgrinded X performs, I want to know how X
performs. callgrind means operations that would be one millisecond
become half a second, and that's thirty frames instead of a sixteenth of
a frame. That means I end up optimizing for function call cycle counts
instead of fixing my algorithms to not starve the hardware.

If wall time matters, callgrind is the wrong tool, and you need a live
profiler.

- ajax

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
 
Old 11-03-2010, 07:11 PM
Jakub Jelinek
 
Default Compile with -fno-omit-frame-pointer on x86_64?

On Wed, Nov 03, 2010 at 04:10:30PM -0400, Adam Jackson wrote:
> On Wed, 2010-11-03 at 19:58 +0100, Jakub Jelinek wrote:
> > On Wed, Nov 03, 2010 at 02:48:12PM -0400, Owen Taylor wrote:
> > > Basically summarizes the situation, and as far as I know nothing has
> > > changed ... with default compilation options, getting callgraph
> > > profiling on x86_64 really requires a DWARF unwinder in the kernel.
> > > Which seems unlikely to happen.
> >
> > But that's the right thing to do.
>
> Sure, but so is a kernel debugger, and it's taken us over ten years to
> get one. I'm pretty okay with doing something wrong now if it gets me
> something usable for long enough to get something right later. I'll
> take 4% across the board if it helps me find the 20% that matters.

Most of the time you don't find the 20% improvements with profilers though,
so all we end up with is just slowing everything by 4%. Definitely a bad
idea, now that per core performance doesn't increase very much and most
programs aren't parallelized at all or just very badly.

Jakub
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
 

Thread Tools




All times are GMT. The time now is 09:22 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org