--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
08-22-2012, 05:56 PM
Dave Anderson
Fix bugs in runq
----- Original Message -----
> Hello Dave,
>
> In runq command, when dumping cfs and rt runqueues,
> it seems that we get the wrong nr_running values of rq
> and cfs_rq.
>
> Please refer to the attached patch.
>
> Thanks
> Zhang Yanfei
Hello Zhang,
I understand what you are trying to accomplish with this patch, but
none of my test dumpfiles can actually verify it because there is no
difference with or without your patch. What failure mode did you see
in your testing? I presume that it just showed "[no tasks queued]"
for the RT runqueue when there were actually tasks queued there?
The reason I ask is that I'm thinking that a better solution would
be to simplify dump_CFS_runqueues() by *not* accessing and using
rq_nr_running, cfs_rq_nr_running or cfs_rq_h_nr_running.
Those counters are only read to determine the "active" argument to
pass to dump_RT_prio_array(), which returns immediately if it is
FALSE. However, if we get rid of the "active" argument and simply
allow dump_RT_prio_array() to always check its queues every time,
it still works just fine.
For example, I tested my set of sample dumpfiles with this patch:
and the output is identical to testing with, and without, your patch.
So the question is whether dump_CFS_runqueues() should be needlessly
complicated with all of the "nr_running" references?
In fact, it also seems possible that a crash could happen at a point in
the scheduler code where those counters are not valid/current/trustworthy.
So unless you can convince me otherwise, I'd prefer to just remove
the "nr_running" business completely.
That being said -- and for your future reference -- when creating patches
such as yours, please consider the following:
When adding entries to the offset_table, always put them at the end
of the structure so that the offsets to the currently-existing members
do not change. This allows older extension modules to still have
valid OFFSET() values without having to be recompiled:
--- a/defs.h
+++ b/defs.h
@@ -1576,6 +1576,7 @@ struct offset_table { /* stash of commonly-used offsets */
long rq_nr_running;
long cfs_rq_rb_leftmost;
long cfs_rq_nr_running;
+ long cfs_rq_h_nr_running;
long cfs_rq_tasks_timeline;
long task_struct_se;
long sched_entity_run_node;
But as you have done, you can still display the new entry from
"help -o" nearby its related cfs_rq_xxx offsets.
Then, during initialization, there's no need to do the preliminary
MEMBER_EXISTS() call in this case -- just call MEMBER_OFFSET_INIT()
regardless. If it fails, the offset will remain -1 (INVALID):
And then after initialization, instead of using MEMBER_EXISTS(), you
can use "if (VALID_MEMBER(cfs_rq_h_nr_running))", which simply
accesses the offset_table -- instead of involving a gdb call every
time:
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
08-24-2012, 06:17 PM
Dave Anderson
Fix bugs in runq
----- Original Message -----
>
>
> ----- Original Message -----
> > Hello Dave,
> >
> > In runq command, when dumping cfs and rt runqueues,
> > it seems that we get the wrong nr_running values of rq
> > and cfs_rq.
> >
> > Please refer to the attached patch.
> >
> > Thanks
> > Zhang Yanfei
>
> Hello Zhang,
>
> I understand what you are trying to accomplish with this patch, but
> none of my test dumpfiles can actually verify it because there is no
> difference with or without your patch. What failure mode did you see
> in your testing? I presume that it just showed "[no tasks queued]"
> for the RT runqueue when there were actually tasks queued there?
>
> The reason I ask is that I'm thinking that a better solution would
> be to simplify dump_CFS_runqueues() by *not* accessing and using
> rq_nr_running, cfs_rq_nr_running or cfs_rq_h_nr_running.
>
> Those counters are only read to determine the "active" argument to
> pass to dump_RT_prio_array(), which returns immediately if it is
> FALSE. However, if we get rid of the "active" argument and simply
> allow dump_RT_prio_array() to always check its queues every time,
> it still works just fine.
>
> For example, I tested my set of sample dumpfiles with this patch:
>
> diff -u -r1.205 task.c
> --- task.c 12 Jul 2012 20:04:00 -0000 1.205
> +++ task.c 22 Aug 2012 15:33:32 -0000
> @@ -7636,7 +7636,7 @@
> OFFSET(cfs_rq_tasks_timeline));
> }
>
> - dump_RT_prio_array(nr_running != cfs_rq_nr_running,
> + dump_RT_prio_array(TRUE,
> runq + OFFSET(rq_rt) + OFFSET(rt_rq_active),
> &runqbuf[OFFSET(rq_rt) +
> OFFSET(rt_rq_active)]);
>
> and the output is identical to testing with, and without, your patch.
>
> So the question is whether dump_CFS_runqueues() should be needlessly
> complicated with all of the "nr_running" references?
>
> In fact, it also seems possible that a crash could happen at a point in
> the scheduler code where those counters are not
> valid/current/trustworthy.
>
> So unless you can convince me otherwise, I'd prefer to just remove
> the "nr_running" business completely.
Hello Zhang,
Here's the patch I've got queued, which resolves the bug you encountered
by simplifying things:
- if (!active) {
- INDENT(5);
- fprintf(fp, "[no tasks queued]
");
- return;
- }
-
qheads = (i = ARRAY_LENGTH(rt_prio_array_queue)) ?
i : get_array_length("rt_prio_array.queue", NULL, SIZE(list_head));
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
08-25-2012, 03:23 AM
Zhang Yanfei
Fix bugs in runq
于 2012年08月25日 02:17, Dave Anderson 写道:
>
>
> ----- Original Message -----
>>
>>
>> ----- Original Message -----
>>> Hello Dave,
>>>
>>> In runq command, when dumping cfs and rt runqueues,
>>> it seems that we get the wrong nr_running values of rq
>>> and cfs_rq.
>>>
>>> Please refer to the attached patch.
>>>
>>> Thanks
>>> Zhang Yanfei
>>
>> Hello Zhang,
>>
>> I understand what you are trying to accomplish with this patch, but
>> none of my test dumpfiles can actually verify it because there is no
>> difference with or without your patch. What failure mode did you see
>> in your testing? I presume that it just showed "[no tasks queued]"
>> for the RT runqueue when there were actually tasks queued there?
>>
>> The reason I ask is that I'm thinking that a better solution would
>> be to simplify dump_CFS_runqueues() by *not* accessing and using
>> rq_nr_running, cfs_rq_nr_running or cfs_rq_h_nr_running.
>>
>> Those counters are only read to determine the "active" argument to
>> pass to dump_RT_prio_array(), which returns immediately if it is
>> FALSE. However, if we get rid of the "active" argument and simply
>> allow dump_RT_prio_array() to always check its queues every time,
>> it still works just fine.
>>
>> For example, I tested my set of sample dumpfiles with this patch:
>>
>> diff -u -r1.205 task.c
>> --- task.c 12 Jul 2012 20:04:00 -0000 1.205
>> +++ task.c 22 Aug 2012 15:33:32 -0000
>> @@ -7636,7 +7636,7 @@
>> OFFSET(cfs_rq_tasks_timeline));
>> }
>>
>> - dump_RT_prio_array(nr_running != cfs_rq_nr_running,
>> + dump_RT_prio_array(TRUE,
>> runq + OFFSET(rq_rt) + OFFSET(rt_rq_active),
>> &runqbuf[OFFSET(rq_rt) +
>> OFFSET(rt_rq_active)]);
>>
>> and the output is identical to testing with, and without, your patch.
>>
>> So the question is whether dump_CFS_runqueues() should be needlessly
>> complicated with all of the "nr_running" references?
>>
>> In fact, it also seems possible that a crash could happen at a point in
>> the scheduler code where those counters are not
>> valid/current/trustworthy.
>>
>> So unless you can convince me otherwise, I'd prefer to just remove
>> the "nr_running" business completely.
>
> Hello Zhang,
>
> Here's the patch I've got queued, which resolves the bug you encountered
> by simplifying things:
>
OK. I see.
And based on this patch, I made a new patch to solve the problem when
dumping rt runqueues. Currently dump_RT_prio_array() doesn't support
rt group scheduler.
In my test, I put some rt tasks into one group, just like below:
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
08-27-2012, 03:53 PM
Dave Anderson
Fix bugs in runq
----- Original Message -----
> And based on this patch, I made a new patch to solve the problem when
> dumping rt runqueues. Currently dump_RT_prio_array() doesn't support
> rt group scheduler.
>
> In my test, I put some rt tasks into one group, just like below:
>
> mkdir /cgroup/cpu/test1
> echo 850000 > /cgroup/cpu/test1/cpu.rt_runtime_us
>
> ./rtloop1 &
> echo $! > /cgroup/cpu/test1/tasks
> ./rtloop1 &
> echo $! > /cgroup/cpu/test1/tasks
> ./rtloop1 &
> echo $! > /cgroup/cpu/test1/tasks
> ./rtloop98 &
> echo $! > /cgroup/cpu/test1/tasks
> ./rtloop45 &
> echo $! > /cgroup/cpu/test1/tasks
> ./rtloop99 &
> echo $! > /cgroup/cpu/test1/tasks
>
... [ cut ] ...
>
> After applying the attached patch, crash seems to work well:
>
> crash> runq
> CPU 0 RUNQUEUE: ffff880028216680
> CURRENT: PID: 5125 TASK: ffff88010799d540 COMMAND: "sh"
> RT PRIO_ARRAY: ffff880028216808
> [ 0] PID: 5136 TASK: ffff8801153cc040 COMMAND: "rtloop99"
> CHILD RT PRIO_ARRAY: ffff88013b050000
> [ 0] PID: 5133 TASK: ffff88010799c080 COMMAND: "rtloop99"
> [ 1] PID: 5131 TASK: ffff880037922aa0 COMMAND: "rtloop98"
> [ 98] PID: 5128 TASK: ffff88011bd87540 COMMAND: "rtloop1"
> PID: 5130 TASK: ffff8801396e7500 COMMAND: "rtloop1"
> PID: 5129 TASK: ffff88011bf5a080 COMMAND: "rtloop1"
> PID: 6 TASK: ffff88013d7c6080 COMMAND: "watchdog/0"
> PID: 3 TASK: ffff88013d7ba040 COMMAND: "migration/0"
> [ 1] PID: 5134 TASK: ffff8801153cd500 COMMAND: "rtloop98"
> PID: 5135 TASK: ffff8801153ccaa0 COMMAND: "rtloop98"
> CFS RB_ROOT: ffff880028216718
> [120] PID: 5109 TASK: ffff880037923500 COMMAND: "sh"
> [120] PID: 5107 TASK: ffff88006eeccaa0 COMMAND: "sh"
> [120] PID: 5123 TASK: ffff880107a4caa0 COMMAND: "sh"
>
> CPU 1 RUNQUEUE: ffff880028296680
> CURRENT: PID: 5086 TASK: ffff88006eecc040 COMMAND: "bash"
> RT PRIO_ARRAY: ffff880028296808
> [ 0] PID: 5137 TASK: ffff880107b35540 COMMAND: "rtloop99"
> PID: 10 TASK: ffff88013cc2cae0 COMMAND: "watchdog/1"
> PID: 2852 TASK: ffff88013bd5aae0 COMMAND: "rtkit-daemon"
> [ 54] CHILD RT PRIO_ARRAY: ffff880138978000
> [ 54] PID: 5132 TASK: ffff88006eecd500 COMMAND: "rtloop45"
> CFS RB_ROOT: ffff880028296718
> [120] PID: 5115 TASK: ffff8801152b1500 COMMAND: "sh"
> [120] PID: 5113 TASK: ffff880139530080 COMMAND: "sh"
> [120] PID: 5111 TASK: ffff88011bd86080 COMMAND: "sh"
> [120] PID: 5121 TASK: ffff880115a9e080 COMMAND: "sh"
> [120] PID: 5117 TASK: ffff8801152b0040 COMMAND: "sh"
> [120] PID: 5119 TASK: ffff880115a9eae0 COMMAND: "sh"
>
> Is this kind of output for rt runqueues ok? Or do you have any
> suggestion?
Hello Zhang,
I find the output a bit confusing. When Daisuke added support for
displaying the runnable tasks that are contained within a cgroup's
task-group scheduling entity, they are simply added to the list
of runnable tasks for a particular priority value. There is no
special "CHILD"/indent to differentiate them. So I'm not sure why
it would be necessary to to do the same thing for RT tasks?
Also, I'm not clear on what is going on here -- on cpu 0, the
"CHILD RT PRIO DISPLAY" looks like it's a "child" of task 5136:
Now, the scheduler is not limited to schedule processes, but can also
work with larger entities. This allows for implementing group scheduling.
So for every RT PRIO_ARRAY, each linked list for each priority has
its element embedded in a structure "sched_entity". This "sched_entity"
could represent two things: a process or a child rt runqueue.
for example, in cpu0, array[0] has four linked elements:
1. task 5136
2. a child rt rq
3. task 6
4. task 3
and the child rt rq has its own runqueue array with 5 tasks in it:
task 5133 with a priority of 0, task 5131 with a priority of 1, and the
last three tasks -- 5128, 5230, 5129 with a priority of 98.
Thanks
Zhang Yanfei
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
08-28-2012, 03:48 PM
Dave Anderson
Fix bugs in runq
----- Original Message -----
> Hmm, may be confusing here...
>
> >> crash> runq
> >> CPU 0 RUNQUEUE: ffff880028216680
> >> CURRENT: PID: 5125 TASK: ffff88010799d540 COMMAND: "sh"
> >> RT PRIO_ARRAY: ffff880028216808
> >> [ 0] PID: 5136 TASK: ffff8801153cc040 COMMAND: "rtloop99"
> >> CHILD RT PRIO_ARRAY: ffff88013b050000
> >> [ 0] PID: 5133 TASK: ffff88010799c080 COMMAND: "rtloop99"
> >> [ 1] PID: 5131 TASK: ffff880037922aa0 COMMAND: "rtloop98"
> >> [ 98] PID: 5128 TASK: ffff88011bd87540 COMMAND: "rtloop1"
> >> PID: 5130 TASK: ffff8801396e7500 COMMAND: "rtloop1"
> >> PID: 5129 TASK: ffff88011bf5a080 COMMAND: "rtloop1"
> >> PID: 6 TASK: ffff88013d7c6080 COMMAND: "watchdog/0"
> >> PID: 3 TASK: ffff88013d7ba040 COMMAND: "migration/0"
> >> [ 1] PID: 5134 TASK: ffff8801153cd500 COMMAND: "rtloop98"
> >> PID: 5135 TASK: ffff8801153ccaa0 COMMAND: "rtloop98"
> >> ...
> >
> > whereas on cpu 1, the "CHILD RT PRIO ARRAY" line is on the same
> > line as
> > priority 54:
> >
> >> CPU 1 RUNQUEUE: ffff880028296680
> >> CURRENT: PID: 5086 TASK: ffff88006eecc040 COMMAND: "bash"
> >> RT PRIO_ARRAY: ffff880028296808
> >> [ 0] PID: 5137 TASK: ffff880107b35540 COMMAND: "rtloop99"
> >> PID: 10 TASK: ffff88013cc2cae0 COMMAND: "watchdog/1"
> >> PID: 2852 TASK: ffff88013bd5aae0 COMMAND: "rtkit-daemon"
> >> [ 54] CHILD RT PRIO_ARRAY: ffff880138978000
> >> [ 54] PID: 5132 TASK: ffff88006eecd500 COMMAND: "rtloop45"
> >> CFS RB_ROOT: ffff880028296718
> >
> > What is it a "child" of? Or maybe "CHILD" the wrong terminology
> > here?
>
>
> Now, the scheduler is not limited to schedule processes, but can also
> work with larger entities. This allows for implementing group
> scheduling.
>
> So for every RT PRIO_ARRAY, each linked list for each priority has
> its element embedded in a structure "sched_entity". This "sched_entity"
> could represent two things: a process or a child rt runqueue.
>
> for example, in cpu0, array[0] has four linked elements:
> 1. task 5136
> 2. a child rt rq
> 3. task 6
> 4. task 3
> and the child rt rq has its own runqueue array with 5 tasks in it:
> task 5133 with a priority of 0, task 5131 with a priority of 1, and the
> last three tasks -- 5128, 5230, 5129 with a priority of 98.
Right, I understand. What I don't understand is the use of the "child"
terminology. If CONFIG_RT_GROUP_SCHED is configured, then the sched_rt_entity
may reference a "group" run queue. To me, it doesn't make sense to use
the term "CHILD RT PRIO_ARRAY" in the header. Wouldn't it make more sense to
call it a "GROUP RT PRIO_ARRAY"? Like this:
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
08-29-2012, 06:30 PM
Dave Anderson
Fix bugs in runq
----- Original Message -----
> > Another question re: your patch -- is it possible to have a "depth" greater
> > than 1?
>
> Yes, "depth" could be greater than 1, see the example below:
>
> CPU 0 RUNQUEUE: ffff880028216680
> CURRENT: PID: 17085 TASK: ffff880137c63540 COMMAND: "bash"
> RT PRIO_ARRAY: ffff880028216808 <-- depth = 0
> [ 0] PID: 17129 TASK: ffff880037aeaaa0 COMMAND: "rtloop99"
> PID: 2832 TASK: ffff88013b09cae0 COMMAND: "rtkit-daemon"
> PID: 6 TASK: ffff88013d7c6080 COMMAND: "watchdog/0"
> [ 1] GROUP RT PRIO_ARRAY: ffff88002ca65000 <-- depth = 1
> [ 1] GROUP RT PRIO_ARRAY: ffff880015821000 <-- depth = 2
> [ 1] PID: 17126 TASK: ffff880135d2a040 COMMAND: "rtloop98"
> [ 98] PID: 17119 TASK: ffff88010190d500 COMMAND: "rtloop1"
> PID: 17121 TASK: ffff88013bd27500 COMMAND: "rtloop1"
> PID: 17120 TASK: ffff88010190caa0 COMMAND: "rtloop1"
> CFS RB_ROOT: ffff880028216718
...
> Hmm, I think the depth could not be that big. So how do you think this
> kind of output.
>
> The attached patch just changed "CHILD" to "GROUP".
Interesting -- how did you set up the depth-greater-than-one scenario?
Anyway, given that it is possible, let's at least tighten up the output display
by changing each "9 * depth" usage to be "6 * depth". That should alter
your example output to look like this:
And also, I'd prefer to not create the dangling "static int depth",
but rather to add a depth argument to dump_RT_prio_array(), where
dump_CFS_runqueues() passes a 0, and dump_RT_prio_array() passes
"depth+1" to itself: