FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu User

 
 
LinkBack Thread Tools
 
Old 11-19-2011, 05:10 AM
"Kevin O'Gorman"
 
Default Bash script clobbers something vital (lucid)

I've been tweaking a backup script. It's not going all that well.
The end result is that something really bad happens to the drive that
stores the backups. Fortunately it's not permanent, but it does
require a reboot (!).

The script makes gzipped tar, ntfsclone and dd backups to a new
directory on an external drive. All of the backup files have been
made when a problem starts with the backup drive.
The script is about to take md5sum's of all the new backup files and shut down.
The problem is that all attempts to access the backup drive result in
failure, reported as an IO error. Even attempts to unmount the drive
fail -- even in a new shell.
The backup drive is a new (1 month old) Seagate 2TB SATA in an
external dock connected by USB.

The shell logic seems okay. If I reboot the system and fsck the
backup drive, all seems working. If I comment out all of the commands
that worked, running the shell starts computing the md5sum's okay.

Since the system is showing damage that no shell command could cause
(requiring a reboot to access the drive), and for other reasons, I'm
not thinking the script is buggy. Even if it were, it should not be
able cause this sort of problem.

Has anybody seen this? I've used Linux since ~ 1992, and I never have.

It's going to be a bear to make this reproduceable. It happens every
time on this laptop, but only after backing up my partitions and data,
and that takes 4 hours (133 GB after compression). I don't even know
how to make a sensible bug report out of this.

--
Kevin O'Gorman, PhD

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-19-2011, 05:18 AM
Karl Auer
 
Default Bash script clobbers something vital (lucid)

On Fri, 2011-11-18 at 22:10 -0800, Kevin O'Gorman wrote:
> I've been tweaking a backup script.

Post the script.

> The shell logic seems okay. If I reboot the system and fsck the
> backup drive, all seems working. If I comment out all of the commands
> that worked, running the shell starts computing the md5sum's okay.

Are you saying that the drive is dead until you reboot, then the drive
becomes mountable again and no data has been lost?

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer@biplane.com.au) +61-2-64957160 (h)
http://www.biplane.com.au/kauer/ +61-428-957160 (mob)

GPG fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
Old fingerprint: B386 7819 B227 2961 8301 C5A9 2EBC 754B CD97 0156
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-19-2011, 07:47 AM
Colin Law
 
Default Bash script clobbers something vital (lucid)

On 19 November 2011 06:10, Kevin O'Gorman <kogorman@gmail.com> wrote:
> I've been tweaking a backup script. *It's not going all that well.
> The end result is that something really bad happens to the drive that
> stores the backups. *Fortunately it's not permanent, but it does
> require a reboot (!).
>
> The script makes gzipped *tar, ntfsclone and dd backups to a new
> directory on an external drive. *All of the backup files have been
> made when a problem starts with the backup drive.
> The script is about to take md5sum's of all the new backup files and shut down.
> The problem is that all attempts to access the backup drive result in
> failure, reported as an IO error. *Even attempts to unmount the drive
> fail -- even in a new shell.
> The backup drive is a new (1 month old) Seagate 2TB SATA in an
> external dock connected by USB.

Have a look in syslog and see what are the first errors that appear
there. No need to get it to fail again for this, you can go back to
the previous log, assuming you know when it happened. Also if you run
the disc utilities do you see any errors noted in the SMART data?

Are you doing anything else when it fails (plugging/unplugging
additional usb devices for example)?

Colin

>
> The shell logic seems okay. *If I reboot the system and fsck the
> backup drive, all seems working. *If I comment out all of the commands
> that worked, running the shell starts computing the md5sum's okay.
>
> Since the system is showing damage that no shell command could cause
> (requiring a reboot to access the drive), and for other reasons, I'm
> not thinking the script is buggy. *Even if it were, it should not be
> able cause this sort of problem.
>
> Has anybody seen this? *I've used Linux since ~ 1992, and I never have.
>
> It's going to be a bear to make this reproduceable. *It happens every
> time on this laptop, but only after backing up my partitions and data,
> and that takes 4 hours (133 GB after compression). *I don't even know
> how to make a sensible bug report out of this.
>
> --
> Kevin O'Gorman, PhD
>
> --
> ubuntu-users mailing list
> ubuntu-users@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>



--
gplus.to/clanlaw

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-19-2011, 01:36 PM
J
 
Default Bash script clobbers something vital (lucid)

On Sat, Nov 19, 2011 at 03:47, Colin Law <clanlaw@googlemail.com> wrote:

> Have a look in syslog and see what are the first errors that appear
> there. *No need to get it to fail again for this, you can go back to
> the previous log, assuming you know when it happened. *Also if you run
> the disc utilities do you see any errors noted in the SMART data?
>
> Are you doing anything else when it fails (plugging/unplugging
> additional usb devices for example)?
>
> Colin

Additionally, when the script gets to the point where it starts trying
to md5sum things, have all the other operations truly finished? 133GB
is a lot of data to move, especially over USB. Are you positive that
all the data is finished being written to the disks at that point?
Perhaps adding in something ridiculous like 'sleep 30m' before you
start the md5sums would point this out. Since the backup takes 4+
hours, an extra 30 minutes isn't going to hurt things in the grand
scheme of things. Anyway, it's a longshot, but worth considering. Of
course, 30m is just an arbitrary time, adjust as you see fit.

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-19-2011, 02:34 PM
Colin Law
 
Default Bash script clobbers something vital (lucid)

On 19 November 2011 14:36, J <dreadpiratejeff@gmail.com> wrote:
> On Sat, Nov 19, 2011 at 03:47, Colin Law <clanlaw@googlemail.com> wrote:
>
>> Have a look in syslog and see what are the first errors that appear
>> there. *No need to get it to fail again for this, you can go back to
>> the previous log, assuming you know when it happened. *Also if you run
>> the disc utilities do you see any errors noted in the SMART data?
>>
>> Are you doing anything else when it fails (plugging/unplugging
>> additional usb devices for example)?
>>
>> Colin
>
> Additionally, when the script gets to the point where it starts trying
> to md5sum things, have all the other operations truly finished? *133GB
> is a lot of data to move, especially over USB. *Are you positive that
> all the data is finished being written to the disks at that point?
> Perhaps adding in something ridiculous like 'sleep 30m' before you
> start the md5sums would point this out. *Since the backup takes 4+
> hours, an extra 30 minutes isn't going to hurt things in the grand
> scheme of things. *Anyway, it's a longshot, but worth considering. Of
> course, 30m is just an arbitrary time, adjust as you see fit.

Are you suggesting that if the data has not been completely written
then trying to read it back will give issues? That does not make any
sense to me. If the data is still available in the cache then it will
be read back from their anyway.

This does suggest a possible issue however. Kevin, does the script do
something after completing the writes to ensure that the md5sum is
read back from the device rather than the cache? Perhaps it is
accidentally unmounting the device.

Colin

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-19-2011, 09:21 PM
"Kevin O'Gorman"
 
Default Bash script clobbers something vital (lucid)

On Fri, Nov 18, 2011 at 10:18 PM, Karl Auer <kauer@biplane.com.au> wrote:
> On Fri, 2011-11-18 at 22:10 -0800, Kevin O'Gorman wrote:
>> I've been tweaking a backup script.
>
> Post the script.

Attached. It's in three parts, all of them living in /root/scripts
ball.sh -- this is what I use on the command-line to back up
everything on the local machine. The problem happens in the middle
of this script. The version attached has been modified slightly --
the 'bkdf' call used to be after bksumit, but that caused the output
of 'df' to be excluded from the md5sums.

bkfuncts.sh -- a file to be sourced. It creates some variables and
defines some functions.

bkdropkick.sh -- a script to be sourced. It defines the backup
tasks for the host "dropkick". I have other versions for other hosts.

NOTE that the scripts check mount points but do not mount or unmount
anything. Moreover, all of the commands invoked are plain vanilla
administrative commands. I have not coded anything in C, nor used any
non- utilities. It's pretty much done with dd, df, blkid, tar,
ntfsclone, gzip and md5sum.
NOTE: you can do what you like with these scripts, but be aware they
make provoke weird behavior on your system. They do for me.

>> The shell logic seems okay. *If I reboot the system and fsck the
>> backup drive, all seems working. *If I comment out all of the commands
>> that worked, running the shell starts computing the md5sum's okay.
>
> Are you saying that the drive is dead until you reboot, then the drive
> becomes mountable again and no data has been lost?

Yes, except that I have not actually verified that the data is okay.
The backup is ext4, and fsck says it replayed the journal okay, and
has no other remarks to make.

> Regards, K.
>


--
Kevin O'Gorman, PhD
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-19-2011, 10:58 PM
Karl Auer
 
Default Bash script clobbers something vital (lucid)

On Sat, 2011-11-19 at 14:21 -0800, Kevin O'Gorman wrote:
> > Post the script.
>
> Attached. It's in three parts

At first blush, I'd say you need to check the inputs more carefully -
when you are playing around with fdisk and dd, it's essential that the
parameters are correct. So in bkfuncts.sh, I'd be wrapping some serious
error checking around those exported variables, especially drive and
loc. It may not have anything to do with the current problem, but it
will probably save you somewhere down the track.

> everything on the local machine. The problem happens in the middle
> of this script.

Locating exactly where a bug happens is pretty much the first step to
fixing it. If the symptom is that the drive is no longer readable, then
set up a telltale file and check at likely points in your scripts that
it still exists. If you suddenly can't find it or read it, the failure
has happened between that point and the last point where you could see
it. That narrows down the debug space.

If you can reduce the magnitude of the backup while you debug, it will
speed your debugging - can you set up a virtual with small disks and and
run all this stuff on the virtual? If it doesn't happen on the virtual,
that's interesting information too.

> If I comment out all of the commands
> that worked, running the shell starts computing the md5sum's okay.

If you know what commands worked, presumably you know which command
didn't... do you know which command failed?

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer@biplane.com.au) +61-2-64957160 (h)
http://www.biplane.com.au/kauer/ +61-428-957160 (mob)

GPG fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
Old fingerprint: B386 7819 B227 2961 8301 C5A9 2EBC 754B CD97 0156
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-19-2011, 11:13 PM
"Kevin O'Gorman"
 
Default Bash script clobbers something vital (lucid)

On Sat, Nov 19, 2011 at 7:34 AM, Colin Law <clanlaw@googlemail.com> wrote:
> On 19 November 2011 14:36, J <dreadpiratejeff@gmail.com> wrote:
>> On Sat, Nov 19, 2011 at 03:47, Colin Law <clanlaw@googlemail.com> wrote:
> Are you suggesting that if the data has not been completely written
> then trying to read it back will give issues? *That does not make any
> sense to me. *If the data is still available in the cache then it will
> be read back from their anyway.
>
> This does suggest a possible issue however. *Kevin, does the script do
> something after completing the writes to ensure that the md5sum is
> read back from the device rather than the cache? *Perhaps it is
> accidentally unmounting the device.

The script does not do anything like that. It does not mount or
dismount anything. Given a choice, I'd checksum the cache I guess,
but I have not been much concerned about it. The md5sum is there to
validate the files if I am going to go back and use them. You may
notice that the script sets them all with the "immutable" attribute,
but I like to test that with the checksum. It's also there because I
have inadvertently deleted some backups in the past, and I don't ever
want to do that again by accident.

--
Kevin O'Gorman, PhD

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-20-2011, 12:02 AM
"Kevin O'Gorman"
 
Default Bash script clobbers something vital (lucid)

On Sat, Nov 19, 2011 at 6:36 AM, J <dreadpiratejeff@gmail.com> wrote:
> On Sat, Nov 19, 2011 at 03:47, Colin Law <clanlaw@googlemail.com> wrote:
>
>> Have a look in syslog and see what are the first errors that appear
>> there. *No need to get it to fail again for this, you can go back to
>> the previous log, assuming you know when it happened. *Also if you run
>> the disc utilities do you see any errors noted in the SMART data?
>>
>> Are you doing anything else when it fails (plugging/unplugging
>> additional usb devices for example)?
>>
>> Colin
>
> Additionally, when the script gets to the point where it starts trying
> to md5sum things, have all the other operations truly finished? *133GB
> is a lot of data to move, especially over USB. *Are you positive that
> all the data is finished being written to the disks at that point?
> Perhaps adding in something ridiculous like 'sleep 30m' before you
> start the md5sums would point this out. *Since the backup takes 4+
> hours, an extra 30 minutes isn't going to hurt things in the grand
> scheme of things. *Anyway, it's a longshot, but worth considering. Of
> course, 30m is just an arbitrary time, adjust as you see fit.

That's a job for sync(1), not sleep, but I'd regard either one as a
workaround at best. As I noted elsewhere, I may have to resort to
sync but if I do, I'll be submitting a bug as well.

--
Kevin O'Gorman, PhD

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 11-20-2011, 12:04 AM
"Kevin O'Gorman"
 
Default Bash script clobbers something vital (lucid)

On Sat, Nov 19, 2011 at 4:13 PM, Kevin O'Gorman <kogorman@gmail.com> wrote:
> On Sat, Nov 19, 2011 at 7:34 AM, Colin Law <clanlaw@googlemail.com> wrote:
>> On 19 November 2011 14:36, J <dreadpiratejeff@gmail.com> wrote:
>>> On Sat, Nov 19, 2011 at 03:47, Colin Law <clanlaw@googlemail.com> wrote:
>> Are you suggesting that if the data has not been completely written
>> then trying to read it back will give issues? *That does not make any
>> sense to me. *If the data is still available in the cache then it will
>> be read back from their anyway.
>>
>> This does suggest a possible issue however. *Kevin, does the script do
>> something after completing the writes to ensure that the md5sum is
>> read back from the device rather than the cache? *Perhaps it is
>> accidentally unmounting the device.
>
> The script does not do anything like that. *It does not mount or
> dismount anything. *Given a choice, I'd checksum the cache I guess,
> but I have not been much concerned about it. *The md5sum is there to
> validate the files if I am going to go back and use them. *You may
> notice that the script sets them all with the "immutable" attribute,
> but I like to test that with the checksum. *It's also there because I
> have inadvertently deleted some backups in the past, and I don't ever
> want to do that again by accident.

Hmm. Pondering this just after sending it, I wonder if the problem
could be caused by the "chattr" command occurring while the cache is
still dirty. It's the first plausible thought I've had about how this
could occur, given what I know about OS'es. I'll try sticking a few
"sync" commands in there.


--
Kevin O'Gorman, PhD

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 

Thread Tools




All times are GMT. The time now is 12:33 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org