FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu User

 
 
LinkBack Thread Tools
 
Old 07-26-2008, 10:06 AM
Alan Milnes
 
Default Replace duplicates with symlinks

I've just invested in a 1Tb external HDD and am consolidating various
backups onto it. What I'd like to do is identify and remove all
duplicate files on it but replace them with symlinks to the original.

I've googled and found fslint and fdupes, the latter looks promising as
you can export the findings to a text file - it produces a line like this:-

# rm thisisaduplicate.txt

however it doesn't tell me where the original is.

Is there an app / utility / script that does this? Even a tutorial that
would help me write my own script?

Thanks

Alan

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-26-2008, 11:45 AM
"Mumia W."
 
Default Replace duplicates with symlinks

On 07/26/2008 05:06 AM, Alan Milnes wrote:
> I've just invested in a 1Tb external HDD and am consolidating various
> backups onto it. What I'd like to do is identify and remove all
> duplicate files on it but replace them with symlinks to the original.
> [...]

Look at "faubackup" also. It's in the Ubuntu repositories.



--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-26-2008, 07:06 PM
Alan Milnes
 
Default Replace duplicates with symlinks

Alan Milnes wrote:
> I've just invested in a 1Tb external HDD and am consolidating various
> backups onto it. What I'd like to do is identify and remove all
> duplicate files on it but replace them with symlinks to the original.
>
> I've googled and found fslint and fdupes, the latter looks promising as
> you can export the findings to a text file - it produces a line like this:-

OK the answer is FSlint which wins the prize for the biggest discrepancy
between usefulness and documentation!! Here's how I did it.

1) sudo apt-get fslint

2) Apps -> System Tools -> FSlint

3) Removed my home directory from the search path and added my USB drive.

4) Selected 'duplicates' then 'find'

5) Once it had finished searching clicked on 'merge', *without*
selecting any of the results. It's really counter-intuitive, it runs
the merge process against those files not selected.

That's it.

Am now running it on my 84% full 1Tb drive - not sure how long it will
take!!

Alan



--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-28-2008, 06:30 PM
Dave Woyciesjes
 
Default Replace duplicates with symlinks

Alan Milnes wrote:
> Alan Milnes wrote:
>> I've just invested in a 1Tb external HDD and am consolidating various
>> backups onto it. What I'd like to do is identify and remove all
>> duplicate files on it but replace them with symlinks to the original.
>>
>> I've googled and found fslint and fdupes, the latter looks promising as
>> you can export the findings to a text file - it produces a line like this:-
>
> OK the answer is FSlint which wins the prize for the biggest discrepancy
> between usefulness and documentation!! Here's how I did it.
>
> 1) sudo apt-get fslint
>
> 2) Apps -> System Tools -> FSlint
>
> 3) Removed my home directory from the search path and added my USB drive.
>
> 4) Selected 'duplicates' then 'find'
>
> 5) Once it had finished searching clicked on 'merge', *without*
> selecting any of the results. It's really counter-intuitive, it runs
> the merge process against those files not selected.
>
> That's it.
>
> Am now running it on my 84% full 1Tb drive - not sure how long it will
> take!!

Very interesting.... I wonder how it compares the files and decides
which to delete? By name alone? File size? Time stamp? And does it move
the files to the Trash, or somewhere else? More research is needed....

--
--- Dave Woyciesjes
--- ICQ# 905818
--- AIM - woyciesjes

"From there to here,
From here to there,
Funny things
are everywhere."
--- Dr. Seuss

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-28-2008, 07:19 PM
Smoot Carl-Mitchell
 
Default Replace duplicates with symlinks

On Mon, 2008-07-28 at 14:30 -0400, Dave Woyciesjes wrote:

> Very interesting.... I wonder how it compares the files and decides
> which to delete? By name alone? File size? Time stamp? And does it move
> the files to the Trash, or somewhere else? More research is needed....

Interesting tool. However, you need to be very careful about merging
file with it. It compares files by checksumming the contents. Files
which are identical at a point in time, may not be appropriate to
replace with either hardlinks or symlinks. The tool found a lot if
identical files in my .evolution directory which would not be
appropriate to symlink.
--
Smoot Carl-Mitchell
System/Network Architect
smoot@tic.com
+1 480 922 7313
cell: +1 602 421 9005

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-28-2008, 07:27 PM
Dave Woyciesjes
 
Default Replace duplicates with symlinks

Smoot Carl-Mitchell wrote:
> On Mon, 2008-07-28 at 14:30 -0400, Dave Woyciesjes wrote:
>
>> Very interesting.... I wonder how it compares the files and decides
>> which to delete? By name alone? File size? Time stamp? And does it move
>> the files to the Trash, or somewhere else? More research is needed....
>
> Interesting tool. However, you need to be very careful about merging
> file with it. It compares files by checksumming the contents. Files
> which are identical at a point in time, may not be appropriate to
> replace with either hardlinks or symlinks. The tool found a lot if
> identical files in my .evolution directory which would not be
> appropriate to symlink.

Hmmm, so it will only merge 2 files if they have identical contents?
e.g. 2 text.txt files will be merged, unless one has an extra letter?

--
--- Dave Woyciesjes
--- ICQ# 905818
--- AIM - woyciesjes

"From there to here,
From here to there,
Funny things
are everywhere."
--- Dr. Seuss

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-28-2008, 09:07 PM
Smoot Carl-Mitchell
 
Default Replace duplicates with symlinks

On Mon, 2008-07-28 at 15:27 -0400, Dave Woyciesjes wrote:
> Smoot Carl-Mitchell wrote:
> > On Mon, 2008-07-28 at 14:30 -0400, Dave Woyciesjes wrote:
> >
> >> Very interesting.... I wonder how it compares the files and decides
> >> which to delete? By name alone? File size? Time stamp? And does it move
> >> the files to the Trash, or somewhere else? More research is needed....
> >
> > Interesting tool. However, you need to be very careful about merging
> > file with it. It compares files by checksumming the contents. Files
> > which are identical at a point in time, may not be appropriate to
> > replace with either hardlinks or symlinks. The tool found a lot if
> > identical files in my .evolution directory which would not be
> > appropriate to symlink.
>
> Hmmm, so it will only merge 2 files if they have identical contents?
> e.g. 2 text.txt files will be merged, unless one has an extra letter?

I do not see how it could work in any other way.
--
Smoot Carl-Mitchell
System/Network Architect
smoot@tic.com
+1 480 922 7313
cell: +1 602 421 9005

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-28-2008, 10:05 PM
Dave Woyciesjes
 
Default Replace duplicates with symlinks

Smoot Carl-Mitchell wrote:
> On Mon, 2008-07-28 at 15:27 -0400, Dave Woyciesjes wrote:
>> Smoot Carl-Mitchell wrote:
>>> On Mon, 2008-07-28 at 14:30 -0400, Dave Woyciesjes wrote:
>>>
>>>> Very interesting.... I wonder how it compares the files and decides
>>>> which to delete? By name alone? File size? Time stamp? And does it move
>>>> the files to the Trash, or somewhere else? More research is needed....

>>> Interesting tool. However, you need to be very careful about merging
>>> file with it. It compares files by checksumming the contents. Files
>>> which are identical at a point in time, may not be appropriate to
>>> replace with either hardlinks or symlinks. The tool found a lot if
>>> identical files in my .evolution directory which would not be
>>> appropriate to symlink.

>> Hmmm, so it will only merge 2 files if they have identical contents?
>> e.g. 2 text.txt files will be merged, unless one has an extra letter?

> I do not see how it could work in any other way.

Well, I inquired because (aside from I just haven't had a spare minute
to look & test myself) it could simply look for duplicate names and go
by that.
But now that you've discovered & shared that it goes by a checksum of
the contents, I like it even more.
Will definitely have to make time to test.

--
--- Dave Woyciesjes
--- ICQ# 905818
--- AIM - woyciesjes

"From there to here,
From here to there,
Funny things
are everywhere."
--- Dr. Seuss

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-28-2008, 10:09 PM
"Brian McKee"
 
Default Replace duplicates with symlinks

On Mon, Jul 28, 2008 at 3:19 PM, Smoot Carl-Mitchell <smoot@tic.com> wrote:
> On Mon, 2008-07-28 at 14:30 -0400, Dave Woyciesjes wrote:
>
>> Very interesting.... I wonder how it compares the files and decides
>> which to delete? By name alone? File size? Time stamp? And does it move
>> the files to the Trash, or somewhere else? More research is needed....
>
> Interesting tool. However, you need to be very careful about merging
> file with it. It compares files by checksumming the contents. Files
> which are identical at a point in time, may not be appropriate to
> replace with either hardlinks or symlinks. The tool found a lot if
> identical files in my .evolution directory which would not be
> appropriate to symlink.

That's interesting - you have more than one file with the same
checksum that are not identical?
I thought that was mathematically very unlikely...

Do you have some examples?

And I assume we are talking hard links not soft links?

Brian

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 07-28-2008, 10:37 PM
Smoot Carl-Mitchell
 
Default Replace duplicates with symlinks

On Mon, 2008-07-28 at 18:09 -0400, Brian McKee wrote:
> On Mon, Jul 28, 2008 at 3:19 PM, Smoot Carl-Mitchell <smoot@tic.com> wrote:
> > On Mon, 2008-07-28 at 14:30 -0400, Dave Woyciesjes wrote:
> >
> >> Very interesting.... I wonder how it compares the files and decides
> >> which to delete? By name alone? File size? Time stamp? And does it move
> >> the files to the Trash, or somewhere else? More research is needed....
> >
> > Interesting tool. However, you need to be very careful about merging
> > file with it. It compares files by checksumming the contents. Files
> > which are identical at a point in time, may not be appropriate to
> > replace with either hardlinks or symlinks. The tool found a lot if
> > identical files in my .evolution directory which would not be
> > appropriate to symlink.
>
> That's interesting - you have more than one file with the same
> checksum that are not identical?
> I thought that was mathematically very unlikely...

You misunderstood what I said. The Evo files are the *.cmeta files
which often have identical content. However, you do not want to symlink
them or hardlink them, since if you change one, you do not want the
other cmeta files to change.
--
Smoot Carl-Mitchell
System/Network Architect
smoot@tic.com
+1 480 922 7313
cell: +1 602 421 9005

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 

Thread Tools




All times are GMT. The time now is 11:23 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org