FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 01-31-2008, 03:27 PM
Matt Bernstein
 
Default forced fsck (again?)

On Jan 22 Theodore Tso wrote:


#!/bin/sh
#
# e2croncheck

VG=closure
VOLUME=root
SNAPSIZE=100m
EMAIL=tytso@mit.edu


[snip]


Well, this isn't a complete solution, because a lot of people don't
use LVM


Please forgive my late noticing of this. The idea is good, and will work
fine in 99% of cases.


I'd love to snapshot (for rsync as well as fsck) my large filesystems,
which have external journals which in turn are in a different VG.


I suspect that if I were to naÔvely run your script, really interesting
things would be likely to happen


So.. I'd love to atomically make two snapshots, but I guess that is Hard
(or would at least require a very coarse lock). I suppose in the meantime
I could "tune2fs -O ^has_journal" the snapshot volume, but I'm too scared
even to do that.


So.. maybe I could request that you either include a Big Fat Disclaimer,
or code based on the following (untested, you can probably do better)?


if (tune2fs -l /dev/${VG}/${VOLUME}|egrep -q "Journal device")
then
echo "Cowardly refusing to play with external journals."
echo "There be dragons!"
exit 1
fi_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-05-2008, 12:51 AM
Bryan Kadzban
 
Default forced fsck (again?)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Andreas Dilger wrote:
> You can add a Signed-Off-By: Andreas Dilger <adilger@sun.com> here,
> as it does everything I think is needed at this point...

Just want to double check: I've made a few more changes here, based on
some of the other messages between you and Ted, and the one message from
Eric; I don't necessarily want to put your sign-off on it if I've
reverted any of your changes. :-)

> Probably good to put a version number in the script, along with your
> name/email so it is clear what version a user is running.

Done, along with a "contributions" section.

>> # e2check configuration file
>
> Minor note - "lvscan configuration file".

Uh, yep. I've also grepped both files for other occurrences of "e2" and
removed anything that doesn't belong. (There was only one occurrence,
also in the config file, under AC_UNKNOWN. It runs more than just
e2fsck now.)

One thing I did realize, though. The script still uses a lot of tools
from e2fsprogs -- logsave and blkid at least; possibly more. Does it
make sense to require e2fsprogs on a system whose only filesystems are
XFS or reiser? (Does it make sense to provide this script as a separate
package -- that would therefore depend on e2fsprogs -- on these systems,
either?) Not entirely sure what I can do about that, though; I can use
"tee -a" instead of logsave, but I'm not sure about blkid. Maybe
/proc/mounts might be helpful?

Summary of other changes:

Added XFS cases to some functions, to document that nothing needs to be
done (and get rid of warnings), and changed xfs_check to xfs_repair.
(Per Eric Sandeen's message.)

Changed back to two e2fsck calls, per Ted's message about the orphaned
inode list getting cleared with e2fsck -p. Also switched back to -fy
instead of -fn, since more useful output is given, and the snapshots are
already read-write. (I can easily revert these if needed, back to a
single "e2fsck -fn" call, but it sounded like the changes were probably
required?) *Also* re-added "-C 0" so running the script interactively
is slightly prettier.

Append output to a single log file (per logical volume) in /var/log,
instead of a separate file per LV per day. Added a date header to the
file as well, before the output of each fsck.

Other comments?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHp8ExS5vET1Wea5wRA9NDAJ4o+3465aI4klH2gIbRPt njzCED+ACfZ5wR
Ynsy2rkpguap4id5xN1MAmU=
=j4w3
-----END PGP SIGNATURE-----
#!/bin/sh
#
# lvcheck, version 1.0
# Maintainer: Bryan Kadzban <bryan@kadzban.is-a-geek.net>

# Other credits:
# Concept and original script by Theodore Tso <tytso@mit.edu>
# on_ac_power is mostly from Debian's powermgmt-base package
# Lots of help (ideas, initial XFS/JFS support, etc.) from
# Andreas Dilger <adilger@sun.com>
# Better XFS support from Eric Sandeen <sandeen@redhat.com>

# Released under the GNU General Public License, either version 2 or
# (at your option) any later version.

# Overview:
#
# Run this from cron periodically (e.g. once per week). If the
# machine is on AC power, it will run the checks; otherwise they will
# all be skipped. (If the script can't tell whether the machine is
# on AC power, it will use a setting in the configuration file
# (/etc/lvcheck.conf) to decide whether to continue with the checks,
# or abort.)
#
# The script will then decide which logical volumes are active, and
# can therefore be checked via an LVM snapshot. Each of these LVs
# will be queried to find its last-check day, and if that was more
# than $INTERVAL days ago (where INTERVAL is set in the configuration
# file as well), or if the last-check day can't be determined, then
# the script will take an LVM snapshot of that LV and run fsck on the
# snapshot. The snapshot will be set to use 1/500 the space of the
# source LV. After fsck finishes, the snapshot is destroyed.
# (Snapshots are checked serially.)
#
# Any LV that passes fsck should have its last-check time updated (in
# the real superblock, not the snapshot's superblock); any LV whose
# fsck fails will send an email notification to a configurable user
# ($EMAIL). This $EMAIL setting is optional, but its use is highly
# recommended, since if any LV fails, it will need to be checked
# manually, offline. Relevant messages are also sent to syslog.

# Set default values for configuration params. Changes to these values
# will be overwritten on an upgrade! To change these values, use
# /etc/lvcheck.conf.
EMAIL='root'
INTERVAL=30
AC_UNKNOWN="CONTINUE"
MINSNAP=256
MINFREE=0

# send $2 to syslog, with severity $1
# severities are emerg/alert/crit/err/warning/notice/info/debug
function log() {
local sev="$1"
local msg="$2"
local arg=

# log warning-or-higher messages to stderr as well
[ "$sev" == "emerg" || "$sev" == "alert" || "$sev" == "crit" ||
"$sev" == "err" || "$sev" == "warning" ] && arg=-s

logger -t lvcheck $arg -p user."$sev" -- "$msg"
}

# determine whether the machine is on AC power
function on_ac_power() {
local any_known=no

# try sysfs power class first
if [ -d /sys/class/power_supply ] ; then
for psu in /sys/class/power_supply/* ; do
if [ -r "${psu}/type" ] ; then
type="`cat "${psu}/type"`"

# ignore batteries
[ "${type}" = "Battery" ] && continue

online="`cat "${psu}/online"`"

[ "${online}" = 1 ] && return 0
[ "${online}" = 0 ] && any_known=yes
fi
done

[ "${any_known}" = "yes" ] && return 1
fi

# else fall back to AC adapters in /proc
if [ -d /proc/acpi/ac_adapter ] ; then
for ac in /proc/acpi/ac_adapter/* ; do
if [ -r "${ac}/state" ] ; then
grep -q on-line "${ac}/state" && return 0
grep -q off-line "${ac}/state" && any_known=yes
elif [ -r "${ac}/status" ] ; then
grep -q on-line "${ac}/status" && return 0
grep -q off-line "${ac}/status" && any_known=yes
fi
done

[ "${any_known}" = "yes" ] && return 1
fi

if [ "$AC_UNKNOWN" == "CONTINUE" ] ; then
return 0 # assume on AC power
elif [ "$AC_UNKNOWN" == "ABORT" ] ; then
return 1 # assume on battery
else
log "err" "Invalid value for AC_UNKNOWN in the config file"
exit 1
fi
}

# attempt to force a check of $1 on the next reboot
function try_force_check() {
local dev="$1"
local fstype="$2"

case "$fstype" in
ext2|ext3)
tune2fs -C 16000 "$dev"
;;
xfs)
# XFS does not enforce check intervals; let email suffice.
;;
*)
log "warning" "Don't know how to force a check on $fstype..."
;;
esac
}

# attempt to set the last-check time on $1 to now, and the mount count to 0.
function try_delay_checks() {
local dev="$1"
local fstype="$2"

case "$fstype" in
ext2|ext3)
tune2fs -C 0 -T now "$dev"
;;
xfs)
# XFS does not enforce check intervals; nothing to delay
;;
*)
log "warning" "Don't know how to delay checks on $fstype..."
;;
esac
}

# print the date that $1 was last checked, in a format that date(1) will
# accept, or "Unknown" if we don't know how to find that date.
function try_get_check_date() {
local dev="$1"
local fstype="$2"

case "$fstype" in
ext2|ext3)
dumpe2fs -h "$dev" 2>/dev/null | grep 'Last checked:' |
sed -e 's/Last checked:[[:space:]]*//'
;;
*)
# XFS does not save the last-checked date

# TODO: add support for various other FSes
echo "Unknown"
;;
esac
}

# check the FS on $1 passively, saving output to $3.
function perform_check() {
local dev="$1"
local fstype="$2"
local tmpfile="$3"

case "$fstype" in
ext2|ext3)
# first clear the orphaned-inode list, to avoid unnecessary FS changes
# in the next step (which would cause an "error" exit from e2fsck).
# -C 0 is present for cases where the script is run interactively
# (logsave -s strips out the progress bar). ignore the return status
# of this e2fsck, as it doesn't matter.
nice logsave -as "${tmpfile}" e2fsck -p -C 0 "$dev"

# then do the real check; -y is here to give more info on any errors
# that may be present on the FS, in the log file. the snapshot is
# writable, so it shouldn't break anything if e2fsck changes it.
nice logsave -as "${tmpfile}" e2fsck -fy -C 0 "$dev"
return $?
;;
reiserfs)
echo Yes | nice logsave -as "${tmpfile}" fsck.reiserfs --check "$dev"
# apparently can't fail? let's hope not...
return 0
;;
xfs)
nice logsave -as "${tmpfile}" xfs_repair -n "$dev"
return $?
;;
jfs)
nice logsave -as "${tmpfile}" fsck.jfs -fn "$dev"
return $?
;;
*)
log "warning" "Don't know how to check $fstype filesystems passively: assuming OK."
;;
esac
}

# do everything needed to check and reset dates and counters on /dev/$1/$2.
function check_fs() {
local vg="$1"
local lv="$2"
local fstype="$3"
local snapsize="$4"

local tmpfile=`mktemp -t lvcheck.log.XXXXXXXXXX`
local errlog="/var/log/lvcheck-${vg}@${lv}"
local snaplvbase="${lv}-lvcheck-temp"
local snaplv="${snaplvbase}-`date +'%Y%m%d'`"

# clean up any left-over snapshot LVs
for lvtemp in /dev/${vg}/${snaplvbase}* ; do
if [ -e "$lvtemp" ] ; then
# Assume the script won't run more than one instance at a time?

log "warning" "Found stale snapshot $lvtemp: attempting to remove."

if ! lvremove -f "${lvtemp##/dev}" ; then
log "error" "Could not delete stale snapshot $lvtemp"
return 1
fi
fi
done

# and create this one
lvcreate -s -l "$snapsize" -n "${snaplv}" "${vg}/${lv}"

if perform_check "/dev/${vg}/${snaplv}" "${fstype}" "${tmpfile}" ; then
log "info" "Background scrubbing of /dev/${vg}/${lv} succeeded."
try_delay_checks "/dev/${vg}/${lv}" "$fstype"
else
log "err" "Background scrubbing of /dev/${vg}/${lv} failed: run fsck offline soon!"
try_force_check "/dev/${vg}/${lv}" "$fstype"

if test -n "$EMAIL"; then
mail -s "Fsck of /dev/${vg}/${lv} failed!" $EMAIL < $tmpfile
fi

# save the log file in /var/log in case mail is disabled
(
echo ""
echo -n " Check on " ; date +'%Y-%m-%d'
echo "======================="
cat "$tmpfile"
) >>"$errlog"
fi

rm -f "$tmpfile"
lvremove -f "${vg}/${snaplv}"
}

# pull in configuration -- overwrite the defaults above if the file exists
[ -r /etc/lvcheck.conf ] && . /etc/lvcheck.conf

# check whether the machine is on AC power: if not, skip fsck
on_ac_power || exit 0

# parse up lvscan output
lvscan 2>&1 | grep ACTIVE | awk '{print $2;}' |
while read DEV ; do
# remove the single quotes around the device name
DEV="`echo "$DEV" | tr -d '`"

# get the FS type: blkid prints TYPE="blah"
eval `blkid -s TYPE "$DEV" | cut -d' ' -f2`

# get the last-check time
check_date=`try_get_check_date "$DEV" "$TYPE"`

# if the date is unknown, run fsck every time the script runs. sigh.
if [ "$check_date" != "Unknown" ] ; then
# add $INTERVAL days, and throw away the time portion
check_day=`date --date="$check_date $INTERVAL days" +'%Y%m%d'`

# get today's date, and skip the check if it's not within the interval
today=`date +'%Y%m%d'`
[ $check_day -gt $today ] && continue
fi

# get the volume group and logical volume names
VG="`lvs --noheadings -o vg_name "$DEV"`"
LV="`lvs --noheadings -o lv_name "$DEV"`"

# get the free space and LV size (in megs), guess at the snapshot
# size, and see how much the admin will let us use (keeping MINFREE
# available)
SPACE="`lvs --noheadings --units M --nosuffix -o vg_free "$DEV"`"
SIZE="`lvs --noheadings --units M --nosuffix -o lv_size "$DEV"`"
SNAPSIZE="`expr "$SIZE" / 500`"
AVAIL="`expr "$SPACE" - "$MINFREE"`"

# if we don't even have MINSNAP space available, skip the LV
if [ "$MINSNAP" -gt "$AVAIL" -o "$AVAIL" -le 0 ] ; then
log "warning" "Not enough free space on volume group for ${DEV}; skipping"
continue
fi

# make snapshot large enough to handle e.g. journal and other updates
[ "$SNAPSIZE" -lt "$MINSNAP" ] && SNAPSIZE="$MINSNAP"

# limit snapshot to available space (VG space minus min-free)
[ "$SNAPSIZE" -gt "$AVAIL" ] && SNAPSIZE="$AVAIL"

# don't need to check SNAPSIZE again: MINSNAP <= AVAIL, MINSNAP <= SNAPSIZE,
# and SNAPSIZE <= AVAIL, combined, means SNAPSIZE must be between MINSNAP
# and AVAIL, which is what we need -- assuming AVAIL > 0

# check it
check_fs "$VG" "$LV" "$TYPE" "$SNAPSIZE"
done

#!/bin/sh

# lvcheck configuration file

# This file follows the pattern of sshd_config: default
# values are shown here, commented-out.

# EMAIL
# Address to send failure notifications to. If empty,
# failure notifications will not be sent.

#EMAIL='root'

# INTERVAL
# Days to wait between checks. All LVs use the same
# INTERVAL, but the "days since last check" value can
# be different per LV, since that value is stored in
# the filesystem superblock.

#INTERVAL=30

# AC_UNKNOWN
# Whether to run the *fsck checks if the script can't
# determine whether the machine is on AC power. Laptop
# users will want to set this to ABORT, while server and
# desktop users will probably want to set this to
# CONTINUE. Those are the only two valid values.

#AC_UNKNOWN="CONTINUE"

# MINSNAP
# Minimum snapshot size to take, in megabytes. The
# default snapshot size is 1/500 the size of the logical
# volume, but if that size is less than MINSNAP, the
# script will use MINSNAP instead. This should be large
# enough to handle e.g. journal updates, and other disk
# changes that require (semi-)constant space.

#MINSNAP=256

# MINFREE
# Minimum amount of space (in megabytes) to keep free in
# each volume group when creating snapshots.

#MINFREE=0

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-11-2008, 05:19 PM
Andreas Dilger
 
Default forced fsck (again?)

On Jan 31, 2008 16:27 +0000, Matt Bernstein wrote:
> Please forgive my late noticing of this. The idea is good, and will work
> fine in 99% of cases.
>
> I'd love to snapshot (for rsync as well as fsck) my large filesystems,
> which have external journals which in turn are in a different VG.
>
> I suspect that if I were to naÔvely run your script, really interesting
> things would be likely to happen

Well, the LVM snapshot code interacts with the filesystem and ext3 locks
the whole filesystem and flushes the entire journal before the snapshot
is done. This means that the journal is "clean" when the snapshot is done
(needs_recovery flag is cleared in the ext3 superblock).

The problem of course is that after the snapshot is done and the filesystem
unfrozen the journal will continue to be used.

> So.. I'd love to atomically make two snapshots, but I guess that is Hard
> (or would at least require a very coarse lock). I suppose in the meantime I
> could "tune2fs -O ^has_journal" the snapshot volume, but I'm too scared
> even to do that.

That would be exactly the right thing to do, because there isn't any data
in the journal at all related to the snapshot. Making this automatic in
some fashion is more desirable of course.

> So.. maybe I could request that you either include a Big Fat Disclaimer, or
> code based on the following (untested, you can probably do better)?
>
> if (tune2fs -l /dev/${VG}/${VOLUME}|egrep -q "Journal device")
> then
> echo "Cowardly refusing to play with external journals."
> echo "There be dragons!"
> exit 1
> fi

Definitely a good idea until someone does a bit of testing with this
and understands the interaction of the snapshot filesystem with the
still-in-use journal device.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 09:37 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org