FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 12-14-2010, 07:15 AM
Yuwen Dai
 
Default compare two directory trees

Dear all,

I burn DVD/CDs from ISO files.* In order to verify the burning is correct,* I wrote a script working like this:
1. mount the DVD and ISO files onto two mount points
2. calculate every file's md5sum in each directory, and save and sort them in two separate files

3. compare the above two files.

The above method does work, but too time consuming because of the md5sum calculating.* Do you have any suggestion to improve the efficiency?

Best regards,
Yuwen
 
Old 12-14-2010, 07:37 AM
Juha Tuuna
 
Default compare two directory trees

On 14.12.2010 10:15, Yuwen Dai wrote:
> Dear all,
>
> I burn DVD/CDs from ISO files. In order to verify the burning is correct, I
> wrote a script working like this:
> 1. mount the DVD and ISO files onto two mount points
> 2. calculate every file's md5sum in each directory, and save and sort them in
> two separate files
> 3. compare the above two files.
>
> The above method does work, but too time consuming because of the md5sum
> calculating. Do you have any suggestion to improve the efficiency?
>
> Best regards,
> Yuwen

Hi, my guess is that there is none (or a faster hash algorithm) unless you use
a programs that first detects different file sizes and calculates the hash
only if the file names and size match. Of course, this does not speed up
things if the data on both trees are equal. I personally use fdupes for this
purpose.

But in your case I'd calculate the hash from the ISO image file and the DVD
device so there's no overhead from travelling the directory tree(s).

--
Juha Tuuna


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D072CBF.8040205@iki.fi">http://lists.debian.org/4D072CBF.8040205@iki.fi
 
Old 12-14-2010, 08:02 AM
"Boyd Stephen Smith Jr."
 
Default compare two directory trees

In <AANLkTi=mj9Tsyk=OohqAbp8bat96k6FM5ZfBGyTG_8Cc@mai l.gmail.com>, Yuwen Dai
wrote:
>I burn DVD/CDs from ISO files. In order to verify the burning is correct,
>I wrote a script working like this:
>1. mount the DVD and ISO files onto two mount points
>2. calculate every file's md5sum in each directory, and save and sort them
>in two separate files
>3. compare the above two files.
>
>The above method does work, but too time consuming because of the md5sum
>calculating. Do you have any suggestion to improve the efficiency?

Don't waste CPU time on MD5. Don't waste CPU time performing filesystem
operations. Compare the two images byte-by-byte using something like diff.

What takes the most time though is the I/O. You could use statistical
techniques to reduce the amount of I/O you perform at the cost of some
accuracy. However, you are probably better off reducing I/O time by getting
faster drives, particularly the optical drive.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
 
Old 12-14-2010, 08:22 AM
shawn wilson
 
Default compare two directory trees

> I burn DVD/CDs from ISO files.* In order to verify the burning is correct,* I wrote a script working like this:

> 1. mount the DVD and ISO files onto two mount points

> 2. calculate every file's md5sum in each directory, and save and sort them in two separate files

> 3. compare the above two files.

>

> The above method does work, but too time consuming because of the md5sum calculating.* Do you have any suggestion to improve the efficiency?

>



The way I see it, you've got a few choices:

1. Compare ls -lR which won't catch burn issues. But pretty quick.

2. dd the cd and diff or checksum that.

3. Copy the files back to disk and run the checksum there (possibly quicker because the disk shouldn't have time to slow down).

4. The way you're doing it


I don't think there are any other options. Though, I'd question your motives here. If you're thinking cheap backup I wouldn't recommend it over tape or just buying tons of the cheapest ($/gig) hdd you can find.
 
Old 12-14-2010, 09:46 AM
Camaleón
 
Default compare two directory trees

On Tue, 14 Dec 2010 16:15:56 +0800, Yuwen Dai wrote:

> I burn DVD/CDs from ISO files. In order to verify the burning is
> correct, I wrote a script working like this:
> 1. mount the DVD and ISO files onto two mount points 2. calculate every
> file's md5sum in each directory, and save and sort them in two separate
> files
> 3. compare the above two files.
>
> The above method does work, but too time consuming because of the md5sum
> calculating. Do you have any suggestion to improve the efficiency?

Months ago someone asked a similiar question, so maybe this helps:

http://lists.debian.org/debian-user/2010/07/thrd4.html#01713

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: pan.2010.12.14.10.46.22@gmail.com">http://lists.debian.org/pan.2010.12.14.10.46.22@gmail.com
 
Old 12-14-2010, 12:14 PM
Liam O'Toole
 
Default compare two directory trees

On 2010-12-14, Yuwen Dai <yuwend@gmail.com> wrote:
> --20cf305644eb154f0904975a6ec6
> Content-Type: text/plain; charset=ISO-8859-1
>
> Dear all,
>
> I burn DVD/CDs from ISO files. In order to verify the burning is correct,
> I wrote a script working like this:
> 1. mount the DVD and ISO files onto two mount points
> 2. calculate every file's md5sum in each directory, and save and sort them
> in two separate files
> 3. compare the above two files.
>
> The above method does work, but too time consuming because of the md5sum
> calculating. Do you have any suggestion to improve the efficiency?

Some installation CDs have a menu option which offers to verify the CD
prior to installation. It might be worth investigating what technique
they use and adapting it for your purpose.

Liam

--
Liam O'Toole
Cork, Ireland



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: slrnigerea.qfe.liam.p.otoole@dipsy.tubbynet">http://lists.debian.org/slrnigerea.qfe.liam.p.otoole@dipsy.tubbynet
 
Old 12-14-2010, 03:40 PM
shawn wilson
 
Default compare two directory trees

> Some installation CDs have a menu option which offers to verify the CD

> prior to installation. It might be worth investigating what technique

> they use and adapting it for your purpose.

>

IIRC they make a checksum and compare with their known checksum.
 
Old 12-14-2010, 08:22 PM
David Christensen
 
Default compare two directory trees

Yuwen Dai wrote:

I burn DVD/CDs from ISO files. In order to verify the burning is correct,

...

The above method does work, but too time consuming because of the md5sum
calculating. Do you have any suggestion to improve the efficiency?


This tool recursively compares file names, mtimes, and sizes in two
directory trees, and reports any differences found:

http://search.cpan.org/~dpchrist/Dpchrist-Directory-1.018/perl-bin/dirdiff


Assuming you have a working Perl cpan installation, you can install the
Dpchrist:irectory module and dirdiff script as follows:

cpan Dpchrist:irectory


HTH,

David


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Archive: 4D07DFFA.5090002@holgerdanske.com">http://lists.debian.org/4D07DFFA.5090002@holgerdanske.com
 
Old 12-15-2010, 01:11 AM
shawn wilson
 
Default compare two directory trees

On Dec 14, 2010 8:45 PM, "Yuwen Dai" <yuwend@gmail.com> wrote:

>>

>> The way I see it, you've got a few choices:

>> 1. Compare ls -lR which won't catch burn issues. But pretty quick.

>> 2. dd the cd and diff or checksum that.

>> 3. Copy the files back to disk and run the checksum there (possibly quicker because the disk shouldn't have time to slow down).

>> 4. The way you're doing it

>>

>> I don't think there are any other options. Though, I'd question your motives here. If you're thinking cheap backup I wouldn't recommend it over tape or just buying tons of the cheapest ($/gig) hdd you can find.


>

> Not for backing up.* I just want to ensure that the burning is OK.

>


Door #2 sounds most promising.
 
Old 12-16-2010, 01:13 AM
T o n g
 
Default compare two directory trees

On Tue, 14 Dec 2010 03:02:11 -0600, Boyd Stephen Smith Jr. wrote:

>>I burn DVD/CDs from ISO files. In order to verify the burning is
>>correct,

There are two ways to verify if the CD/DVD burning is correct.

- verify it as a whole
- verify each individual files

If you are satisfied with the first level, check out isomd5sum
which is what RedHat used to verify it released CD (implantiso).

isomd5sum is a set of utilities for implanting a MD5 checksum in an
ISO (or any block device), then verifying the checksum later. isomd5sum
is not simply an MD5 of the entire ISO; it checksums the data inside a
standard ISO9660 image and write block checksum information to an ISO9660
header, that will carry over to burning the CD.

Else,

>> I wrote a script working like this:
>>1. mount the DVD and ISO files onto two mount points 2. calculate every
>>file's md5sum in each directory, and save and sort them in two separate
>>files
>>3. compare the above two files.
>>
>>The above method does work, but too time consuming because of the md5sum
>>calculating. Do you have any suggestion to improve the efficiency?

This is the only option that you have in order to verify each individual
files, ie, check file by file.

> Don't waste CPU time on MD5. Don't waste CPU time performing filesystem
> operations. Compare the two images byte-by-byte using something like
> diff.

Well, IMHO, you shouldn't waste CPU time on MD5 calculation, but byte-by-
byte comparison is not a good choice either, because, CD/DVD has the
tendency of deteriorate over the time. Even it is ok freshly burned, it
does not mean it always be so, because of the wear and tear. Moreover, by
the time you want to compare again, you may find that your source is gone!

My solution: I use CRC32 and put the checksum file on the disk. I coined
such solution when I was facing another weird situation -- a CD burned on
a particular burner can not be reliably read from other burners.

The checksum program that I wrote is able to do both CRC32 and MD5, but
static tells me that CRC32 is more than enough for such case:

The probability of a corrupted string going undetected is 1/(2^n). I.e.,
a 32-bit CRC has a probability of 1/(2^32), which is about 2.3E-10 (less
than one in a billion).

The new fast 32-bit CRC algorithm is one magnitude faster than current HD
I/O. I.e., using 32-bit CRC, the only bottleneck you have is your HD
speed.

FYI, I'm been using my program for nearly 10 years, but just recently I
have time to put it up on the internet (But only get time to get it
started):

http://savannah.nongnu.org/p/checksum

HTH

--
Tong (remove underscore(s) to reply)
http://xpt.sourceforge.net/techdocs/
http://xpt.sourceforge.net/tools/


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: iebsjp$m14$1@dough.gmane.org">http://lists.debian.org/iebsjp$m14$1@dough.gmane.org
 

Thread Tools




All times are GMT. The time now is 02:52 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org