We extensively researched the problem.
*
The TLB flush softlockup is only a CONSEQUENCE of a deadlock.
*
Background: The TLB flush is issued by a CPU to *a number of other CPUs using inter-processor interupts to progagate paging changes. Then the issuing CPU loops until all processor acknowledge the change. If such processor is in deadlock on a spinlock, this never hapens, then the softlockup triggers. The deadlock arise on a spinlock, this lock may be held by user code sometimes (through /proc or /sys interfaces of modules).
*
The only way to identify the root cause (i.e. which driver is causing problems) is to dump ALL CPU stacks in the soft lockup code.
*
One way to do that is to modifiy the kernel and add
*************** arch_trigger_all_cpu_backtrace()
in the
*************** kernel/softlockup.c:softlockup_tick()
function.
*
This is based on NMI IPI which ensure all stacks are dump, even in the case of deadlock (well don't expect the impossible to happen either).
*
You should easily find the faulty driver and post the relevant bug.
*
Hope this helps
*
François-Frédéric