FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora User

 
 
LinkBack Thread Tools
 
Old 02-24-2010, 12:31 AM
Marko Vojinovic
 
Default Recursive comparing of files

Hi folks! :-)

I have the following task: there are two directories on the disk, say a/ and
b/, with various subdirectories and files inside. I need to find and erase all
*duplicate* files, and after that all empty directories. The files may reside in
different directories, may have different names, but if they have identical
*contents*, file from b/ branch should be deleted.

Now, the directories that I have are rather large and I wouldn't want to go
hunt for duplicates manually. Is there some tool that can at least identify
and list duplicate files in some directory structure?

I could think of an algorithm like:

1) list all files in all subdirectories of a/ along with their file size
2) do the same thing for files in b/
3) sort and compare lists, look for pairs of files with identical size
4) test each pair to see if the file content is the same, and if yes, list them
in the output

I could probably be able to write a bash script which would do this, but I
guess this problem is common and there are already some available tools which
would do this for me. Any suggestions?

Thanks, :-)
Marko

--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
 
Old 02-24-2010, 12:35 AM
Mike Park
 
Default Recursive comparing of files

Hi there,

Try 'fslint', it does hash-sum comparisons on files in different
dirs.. Although the front-end 'fslint-gui' isn't exactly built for
automation, it at least lets you do big sweeps of dupe-deletion.


--Mike

On Tue, Feb 23, 2010 at 5:31 PM, Marko Vojinovic <vvmarko@gmail.com> wrote:
>
> Hi folks! :-)
>
> I have the following task: there are two directories on the disk, say a/ and
> b/, with various subdirectories and files inside. I need to find and erase all
> *duplicate* files, and after that all empty directories. The files may reside in
> different directories, may have different names, but if they have identical
> *contents*, file from b/ branch should be deleted.
>
> Now, the directories that I have are rather large and I wouldn't want to go
> hunt for duplicates manually. Is there some tool that can at least identify
> and list duplicate files in some directory structure?
>
> I could think of an algorithm like:
>
> 1) list all files in all subdirectories of a/ along with their file size
> 2) do the same thing for files in b/
> 3) sort and compare lists, look for pairs of files with identical size
> 4) test each pair to see if the file content is the same, and if yes, list them
> in the output
>
> I could probably be able to write a bash script which would do this, but I
> guess this problem is common and there are already some available tools which
> would do this for me. Any suggestions?
>
> Thanks, :-)
> Marko
>
> --
> users mailing list
> users@lists.fedoraproject.org
> To unsubscribe or change subscription options:
> https://admin.fedoraproject.org/mailman/listinfo/users
> Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
>
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
 
Old 02-24-2010, 01:05 AM
"Mick M."
 
Default Recursive comparing of files

This is what I use:
http://duplicatefilessearcher.net/






--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
 
Old 02-24-2010, 01:38 AM
Antonio Olivares
 
Default Recursive comparing of files

--- On Tue, 2/23/10, Marko Vojinovic <vvmarko@gmail.com> wrote:

> From: Marko Vojinovic <vvmarko@gmail.com>
> Subject: Recursive comparing of files
> To: users@lists.fedoraproject.org
> Date: Tuesday, February 23, 2010, 5:31 PM
>
> Hi folks! :-)
>
> I have the following task: there are two directories on the
> disk, say a/ and
> b/, with various subdirectories and files inside. I need to
> find and erase all
> *duplicate* files, and after that all empty directories.
> The files may reside in
> different directories, may have different names, but if
> they have identical
> *contents*, file from b/ branch should be deleted.
>
> Now, the directories that I have are rather large and I
> wouldn't want to go
> hunt for duplicates manually. Is there some tool that can
> at least identify
> and list duplicate files in some directory structure?
>
> I could think of an algorithm like:
>
> 1) list all files in all subdirectories of a/ along with
> their file size
> 2) do the same thing for files in b/
> 3) sort and compare lists, look for pairs of files with
> identical size
> 4) test each pair to see if the file content is the same,
> and if yes, list them
> in the output
>
> I could probably be able to write a bash script which would
> do this, but I
> guess this problem is common and there are already some
> available tools which
> would do this for me. Any suggestions?
>
> Thanks, :-)
> Marko
>
> --


There is a tool called fdupes. Read more about it here:

http://www.cyberciti.biz/faq/linux-unix-finds-duplicate-files-in-given-directories/

<quote>
You need to use a tool called fdupes. It will searche the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison. fdupes is a nice tool to get rid of duplicate files.
</quote>
>

Regards,

Antonio



--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
 
Old 02-25-2010, 09:38 AM
Marko Vojinovic
 
Default Recursive comparing of files

On Wednesday 24 February 2010 01:31:00 Marko Vojinovic wrote:
> I have the following task: there are two directories on the disk, say a/
> and b/, with various subdirectories and files inside. I need to find and
> erase all *duplicate* files, and after that all empty directories.

Folks, thanks for all suggestions! I also found yet another one of these
tools, called komparator.

Basically, if you prefer the command line, fdupes is the way to go. If you
prefer a GUI, you can use fslint, komparator or any other.

They are all basically feature-full, and for my purposes each of them did the
job quite well.

Thanks a lot! :-)
Marko

--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
 
Old 02-25-2010, 01:52 PM
Don Quixote de la Mancha
 
Default Recursive comparing of files

I do understand that this isn't what you need, but I understand that
there are some tools available for Windows, that can go through
collections of JPEG photographs, and eliminate the duplicates based
not on the precise data, but on the appearance of the scene in the
image.

That is, if you had two visually-identical JPEGs of President Obama,
one encoded with JPEG high quality and the other with JPEG low
quality, the bits of the two files would be in no way correllated.
But these tools would be able to tell that they were both photos of
the same scene, and delete the low-quality version.

This sort of tool is very useful for people who like to... collect....
photographs...

I don't know of any tool like that for Linux though, or even for Mac
OS X. I can't recall the names of any of the tools, just that they
were only available for Windows.

Don Quixote
--
Don Quixote de la Mancha
quixote@dulcineatech.com
http://www.dulcineatech.com

Dulcinea Technologies Corporation: Software of Elegance and Beauty.
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
 

Thread Tools




All times are GMT. The time now is 03:02 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org