About 1 in 4 times Fedora 14 hangs during shutdown on at least 4 of my systems.
Looking at the shutdown messages (ESC in the splash screen) and adding some
debug statements to /etc/rc.d/rc0.d/S01halt, it hangs after the messages:
Adding some debug, this appears after the following command is executed:
"fstab-decode mount -n -o ro,remount /dev/sda1 /"
The file system is ext4 on all of the systems and that command looks ok.
Any ideas ?
Terry
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-29-2011, 10:42 PM
JB
Fedora 14: Shutdown problem
Terry Barnaby <terry1 <at> beam.ltd.uk> writes:
> ...
Give us unedited outputs:
$ cat /etc/fstab
$ cat /etc/mtab
$ cat /proc/mounts
JB
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-30-2011, 12:13 PM
Terry Barnaby
Fedora 14: Shutdown problem
On 01/29/2011 11:42 PM, JB wrote:
> Terry Barnaby<terry1<at> beam.ltd.uk> writes:
>
>> ...
>
> Give us unedited outputs:
> $ cat /etc/fstab
> $ cat /etc/mtab
> $ cat /proc/mounts
>
> JB
>
>
>
>
The above files:
Terry
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-30-2011, 12:58 PM
JB
Fedora 14: Shutdown problem
Terry Barnaby <terry1 <at> beam.ltd.uk> writes:
>
> Hi,
>
> About 1 in 4 times Fedora 14 hangs during shutdown on at least 4 of my systems.
> Looking at the shutdown messages (ESC in the splash screen) and adding some
> debug statements to /etc/rc.d/rc0.d/S01halt, it hangs after the messages:
>
> "Unmounting file systems"
> "init: Re-executing /sbin/init"
>
> with the message:
>
> "mount: you must specify the file system type."
>
> Adding some debug, this appears after the following command is executed:
> "fstab-decode mount -n -o ro,remount /dev/sda1 /"
>
> The file system is ext4 on all of the systems and that command looks ok.
>
> Any ideas ?
>
> Terry
This is our offending script, with the section of interest to us.
/etc/rc.d/rc0.d/S01halt
# Remount read only anything that's left mounted.
# echo $"Remounting remaining filesystems readonly"
mount | awk '{ print $1,$3 }' | while read dev dir; do
fstab-decode mount -n -o ro,remount $dev $dir
done
# If we left mdmon's running wait for the raidsets to become clean
...
Place these debugging snapshot statements in there to see what is in
/etc/mtab prior to 'mount' (which are equivalent).
The others two are helpers ('date' to document time, ""proc/mounts" is even
more useful as it is generally more accurate than /etc/mtab).
Let it be there as long as needed to catch the difference between good and
bad shutdowns (as you said the bad one happens sporadically, every 4th
shutdown or so).
Keep in mind that system update may replace this script.
Just in case, this display would be of interest:
$ cat /proc/filesystems
JB
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
I think correction is needed as /proc is not available any more because it
was unmounted immediatelly prior to our debugging statements.
So, remove that:
cat /proc/mounts >> /halt.debug
JB
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-30-2011, 04:55 PM
Terry Barnaby
Fedora 14: Shutdown problem
On 01/30/2011 02:11 PM, JB wrote:
> JB<jb.1234abcd<at> gmail.com> writes:
>
>>
>> # ################################################## ##############
>> # debugging snapshot statements
>> # ----------------------------------------------------------------
>> date>> /halt.debug
>> cat /etc/mtab>> /halt.debug
>> cat /proc/mounts>> /halt.debug
>> # ################################################## ##############
>>
>
> I think correction is needed as /proc is not available any more because it
> was unmounted immediatelly prior to our debugging statements.
> So, remove that:
> cat /proc/mounts>> /halt.debug
>
> JB
>
>
I added the debug, and basically it was the same when it shutdown cleanly
and when it failed.
# A bad one
Sun Jan 30 17:12:08 GMT 2011
/dev/sda1 / ext4 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
Mount:
/dev/sda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
fstab-decode mount -n -o ro,remount /dev/sda1 /
fstab-decode mount -n -o ro,remount proc /proc
fstab-decode mount -n -o ro,remount sysfs /sys
# A good one, / has been remounted ro and so the last two unmount messages are
not present
Sun Jan 30 17:18:16 GMT 2011
/dev/sda1 / ext4 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
Mount:
/dev/sda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
fstab-decode mount -n -o ro,remount /dev/sda1 /
I put a /bin/sh after this so I could have a look at the systems state at this
point when the remount failed. The last few items of the "ps ax" list is shown:
1282 ? S 0:00 [rpciod/1]
1378 ? S 0:00 [nfsiod]
1381 ? S 0:00 [lockd]
1960 ? D 0:00 [flush-0:19]
2006 ? Zl 0:00 [akonadi_control] <defunct>
2008 ? Z 0:00 [akonadiserver] <defunct>
2010 ? Zl 0:00 [mysqld] <defunct>
2125 ? Ds 0:00 [pulseaudio]
2332 ? Z 0:00 [gconf-helper] <defunct>
2365 ? D 0:00 [dcopserver]
2448 ? Ss 0:00 /bin/bash /etc/rc0.d/S01halt start
3001 ? S 0:00 /bin/sh
3019 ? R 0:00 ps ax
It looks like some processes are left over from the GUI (KDE).
I suspect they have log files or something else opened on /
in write mode and this is stopping the remount to ro working.
Running "mount -o remount,ro /" at this point fails with "/ is busy".
They are probably waiting for /home, which is an NFS files system, that
was unmounted earlier on in the shutdown process.
I restarted the network and netfs and these processes disappeared. After
shuting down netfs and network as well as some other processes left over
the remount command worked fine and the system shutdown.
Note I am using the "network" not "NetworkManager" service. The NetworkManager
service does not work well for me with systems using networked /home and
other file systems.
I suspect an issue further up the shudown chain where the system should
wait for all of the processes to shutdown "before" unmounting the NFS files
systems. I will have a look here, any ideas ?
Terry
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-30-2011, 05:01 PM
Terry Barnaby
Fedora 14: Shutdown problem
On 01/30/2011 05:55 PM, Terry Barnaby wrote:
> On 01/30/2011 02:11 PM, JB wrote:
>> JB<jb.1234abcd<at> gmail.com> writes:
>>
>>>
>>> # ################################################## ##############
>>> # debugging snapshot statements
>>> # ----------------------------------------------------------------
>>> date>> /halt.debug
>>> cat /etc/mtab>> /halt.debug
>>> cat /proc/mounts>> /halt.debug
>>> # ################################################## ##############
>>>
>>
>> I think correction is needed as /proc is not available any more because it
>> was unmounted immediatelly prior to our debugging statements.
>> So, remove that:
>> cat /proc/mounts>> /halt.debug
>>
>> JB
>>
>>
> I added the debug, and basically it was the same when it shutdown cleanly
> and when it failed.
>
> # A bad one
> Sun Jan 30 17:12:08 GMT 2011
> /dev/sda1 / ext4 rw 0 0
> proc /proc proc rw 0 0
> sysfs /sys sysfs rw 0 0
> Mount:
> /dev/sda1 on / type ext4 (rw)
> proc on /proc type proc (rw)
> sysfs on /sys type sysfs (rw)
> fstab-decode mount -n -o ro,remount /dev/sda1 /
> fstab-decode mount -n -o ro,remount proc /proc
> fstab-decode mount -n -o ro,remount sysfs /sys
>
> # A good one, / has been remounted ro and so the last two unmount messages are
> not present
> Sun Jan 30 17:18:16 GMT 2011
> /dev/sda1 / ext4 rw 0 0
> proc /proc proc rw 0 0
> sysfs /sys sysfs rw 0 0
> Mount:
> /dev/sda1 on / type ext4 (rw)
> proc on /proc type proc (rw)
> sysfs on /sys type sysfs (rw)
> fstab-decode mount -n -o ro,remount /dev/sda1 /
>
> I put a /bin/sh after this so I could have a look at the systems state at this
> point when the remount failed. The last few items of the "ps ax" list is shown:
>
> 1282 ? S 0:00 [rpciod/1]
> 1378 ? S 0:00 [nfsiod]
> 1381 ? S 0:00 [lockd]
> 1960 ? D 0:00 [flush-0:19]
> 2006 ? Zl 0:00 [akonadi_control]<defunct>
> 2008 ? Z 0:00 [akonadiserver]<defunct>
> 2010 ? Zl 0:00 [mysqld]<defunct>
> 2125 ? Ds 0:00 [pulseaudio]
> 2332 ? Z 0:00 [gconf-helper]<defunct>
> 2365 ? D 0:00 [dcopserver]
> 2448 ? Ss 0:00 /bin/bash /etc/rc0.d/S01halt start
> 3001 ? S 0:00 /bin/sh
> 3019 ? R 0:00 ps ax
>
> It looks like some processes are left over from the GUI (KDE).
> I suspect they have log files or something else opened on /
> in write mode and this is stopping the remount to ro working.
> Running "mount -o remount,ro /" at this point fails with "/ is busy".
> They are probably waiting for /home, which is an NFS files system, that
> was unmounted earlier on in the shutdown process.
> I restarted the network and netfs and these processes disappeared. After
> shuting down netfs and network as well as some other processes left over
> the remount command worked fine and the system shutdown.
>
> Note I am using the "network" not "NetworkManager" service. The NetworkManager
> service does not work well for me with systems using networked /home and
> other file systems.
>
> I suspect an issue further up the shudown chain where the system should
> wait for all of the processes to shutdown "before" unmounting the NFS files
> systems. I will have a look here, any ideas ?
>
> Terry
>
I am guessing this is primarily a KDE problem (although the system should still
shutdown cleanly even if processes are still there waiting on NFS). I presume
the KDE shutdown should wait for all of its processes to complete exit before
it asks init to shutdown the system ...
Terry
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-30-2011, 05:51 PM
JB
Fedora 14: Shutdown problem
Terry Barnaby <terry1 <at> beam.ltd.uk> writes:
> ...
Firstly, I have to re-correct myself - my original debugging statemets were
correct.
I checked it on my machine and /proc/mounts is still available, so we should
include it as it has more info than /etc/mtab. It could give us a clue about
any other mount-related things.
Secondly, I have few things to check with regard to all of this. Perhaps
something pops up.
JB
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-30-2011, 06:54 PM
Terry Barnaby
Fedora 14: Shutdown problem
On 01/30/2011 06:51 PM, JB wrote:
> Terry Barnaby<terry1<at> beam.ltd.uk> writes:
>
>> ...
>
> Firstly, I have to re-correct myself - my original debugging statemets were
> correct.
> I checked it on my machine and /proc/mounts is still available, so we should
> include it as it has more info than /etc/mtab. It could give us a clue about
> any other mount-related things.
>
> ...
> # ################################################## ##############
> # debugging snapshot statements
> # ----------------------------------------------------------------
> echo "date">> /halt.debug
> date>> /halt.debug
> echo "cat /etc/mtab">> /halt.debug
> cat /etc/mtab>> /halt.debug
> echo "cat /proc/mounts">> /halt.debug
> cat /proc/mounts>> /halt.debug
> # ################################################## ##############
> ...
>
> Secondly, I have few things to check with regard to all of this. Perhaps
> something pops up.
>
> JB
>
>
>
>
>
>
I am fairly sure the problem is the akonadi/pulseaudio/gconf-helper/dcopserver
processes that are still hanging around due to the fact that the NFS mounts they
are using have gone away.
As I said remounting the NFS /home allows them to exit and allows / to then be
remounted ro and the system shutdown.
I think there are three bugs here:
1. KDE is not waiting for all of its sessions processes to exit before
telling init to halt the system.
2. The rc0 scripts are not making sure all processes using the NFS files
systems have exited prior to unmounting them.
3. The rc0 final / remount ro commadn should make sure all processes have
been killed prior to issuing the remount command. (will the kernel allow
them to be killed when waiting on unmounted NFS ? Kernel bug ?)
Cheers
Terry
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
01-30-2011, 07:14 PM
Joe Zeff
Fedora 14: Shutdown problem
On 01/30/2011 11:54 AM, Terry Barnaby wrote:
> I am fairly sure the problem is the akonadi/pulseaudio/gconf-helper/dcopserver
I don't want to hijack the main thread, so I've changed the subject
slightly. I've been wondering something and it finally got to the point
that I had to ask: is it just me, or does anybody else look at the word
"akonadi" and think it's Hebrew?
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines