FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 08-27-2012, 06:32 PM
Brian Mathis
 
Default Deduplication data for CentOS?

On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut <tr.ml@gmx.de> wrote:
> Hi list,
>
> is there any working solution for deduplication of data for centos?
> We are trying to find a solution for our backup server which runs a bash
> script invoking xdelta(3). But having this functionality in fs is much
> more friendly...
>
> We have looked into lessfs, sdfs and ddar.
> Are these filesystems ready to use (on centos)?
> ddar is sthg different, I know.
>
> Thx
> Rainer


This is something I have been thinking about peripherally for a while
now. What are your impressions of SDFS (OpenDedupe)? I had been
hoping it would be pretty good. Any issues with it on CentOS?


❧ Brian Mathis
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-27-2012, 08:55 PM
Adam Tauno Williams
 
Default Deduplication data for CentOS?

On Mon, 2012-08-27 at 14:32 -0400, Brian Mathis wrote:
> On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut <tr.ml@gmx.de> wrote:
> > We have looked into lessfs, sdfs and ddar.
> > Are these filesystems ready to use (on centos)?
> > ddar is sthg different, I know.
> This is something I have been thinking about peripherally for a while
> now. What are your impressions of SDFS (OpenDedupe)? I had been
> hoping it would be pretty good. Any issues with it on CentOS?

I've used it for backups; it works reliably. It is memory hungry
however [sort of the nature of block-level deduplication].
<http://www.wmmi.net/documents/OpenDedup.pdf>
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 07:58 AM
Rainer Traut
 
Default Deduplication data for CentOS?

Am 27.08.2012 16:04, schrieb Janne Snabb:
> On 08/27/2012 07:23 PM, Rainer Traut wrote:
>
>> Yeah I know it has this feature, but is there a working zfs
>> implementation for linux?
>
> I have heard some positive feedback about http://zfsonlinux.org/ but I
> have not had time to test myself yet. It probably depends on your
> intended usage. It is a new in-kernel ZFS implementation (different from
> the old FUSE implementation).
>
> RHEL 6.2 x86_64 is listed as one of the supported OSes, so it probably
> works fine with CentOS too.
>
> There is some positive and negative feedback in the following links:
>
> https://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/thread/5a739039623f8fb1
>
> http://pingd.org/2012/installing-zfs-raid-z-on-centos-6-2-with-ssd-caching.html
>
> Please share your results if you do any testing

The website looks promising. They are using a thing called SPL,
Sun/Solaris Porting Layer to be able to use the Solaris ZFS code.
But there is no more OpenSolaris, isn't it? Means they have to stay with
the ZFS code from when it was open?
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 08:03 AM
Rainer Traut
 
Default Deduplication data for CentOS?

Am 27.08.2012 18:04, schrieb Les Mikesell:
> On Mon, Aug 27, 2012 at 6:55 AM, Rainer Traut <tr.ml@gmx.de> wrote:
>>
>> is there any working solution for deduplication of data for centos?
>> We are trying to find a solution for our backup server which runs a bash
>> script invoking xdelta(3). But having this functionality in fs is much
>> more friendly...
>>
>
> Below forwarded on behalf of mroth:
>
> Les,
>
> A favor, please? Could you post this for me? Spamhouse is bouncing me
> again, this time because *they* have a bug (see below). I tried asking
> Karanbir, but I guess he's not online yet....
>
> Thanks in advance.
>
> John R Pierce wrote:
>> On 08/27/12 4:55 AM, Rainer Traut wrote:
>>> is there any working solution for deduplication of data for centos? We
> are trying to find a solution for our backup server which runs a bash
> script invoking xdelta(3). But having this functionality in fs is much
> more friendly...
>>

> I've tried, twice, to suggest that a workaround that doesn't involve a
> new, and possibly experimental f/s would be to use rsync with hard links,
> which is what we do. There's no way we have enough disk space for 5 weeks
> of terabytes of data....

Rsync is of no use for us. We have mainly big Domino .nsf files which
only change slightly. So rsync would not be able to make many hardlinks.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 08:14 AM
Rainer Traut
 
Default Deduplication data for CentOS?

Am 27.08.2012 22:55, schrieb Adam Tauno Williams:
> On Mon, 2012-08-27 at 14:32 -0400, Brian Mathis wrote:
>> On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut <tr.ml@gmx.de> wrote:
>>> We have looked into lessfs, sdfs and ddar.
>>> Are these filesystems ready to use (on centos)?
>>> ddar is sthg different, I know.
>> This is something I have been thinking about peripherally for a while
>> now. What are your impressions of SDFS (OpenDedupe)? I had been
>> hoping it would be pretty good. Any issues with it on CentOS?
>
> I've used it for backups; it works reliably. It is memory hungry
> however [sort of the nature of block-level deduplication].
> <http://www.wmmi.net/documents/OpenDedup.pdf>

I have read the pdf and one thing strikes me:
--io-chunk-size <SIZE in kB; use 4 for VMDKs, defaults to 128>

and later:
● Memory
● 2GB allocation OK for:
● 200GB@4KB chunks
● 6TB@128KB chunks
...
32TB of data at 128KB requires
8GB of RAM. 1TB @ 4KB equals
the same 8GB.

We are using ESXi5 in a SAN environment, right now with a 2TB backup volume.
You are right, 16GB of ram is still much...
And why 4k chunk size for VMDKs?

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 08:48 AM
Fajar Priyanto
 
Default Deduplication data for CentOS?

Sorry for the top posting.
Dedup is just a hype. After a while the table that manage the deduped data
will be just too big. Don't use it for long term.

Sent from Samsung Galaxy ^^
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 08:58 AM
John R Pierce
 
Default Deduplication data for CentOS?

On 08/28/12 12:58 AM, Rainer Traut wrote:
> The website looks promising. They are using a thing called SPL,
> Sun/Solaris Porting Layer to be able to use the Solaris ZFS code.
> But there is no more OpenSolaris, isn't it? Means they have to stay with
> the ZFS code from when it was open?

opensolaris spawned ilumnos (the kernel) and openindiana (a complete OS
based on ilumnos and opensolaris) as well as some other ilumnos based
distributions like nexenta.





--
john r pierce N 37, W 122
santa cruz ca mid-left coast

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 09:00 AM
John R Pierce
 
Default Deduplication data for CentOS?

On 08/28/12 1:03 AM, Rainer Traut wrote:
> Rsync is of no use for us. We have mainly big Domino .nsf files which
> only change slightly. So rsync would not be able to make many hardlinks.

so you need block level dedup? good luck with that. never seen a
scheme yet that wasn't full of issues or had really bad performance.



--
john r pierce N 37, W 122
santa cruz ca mid-left coast

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 10:17 AM
Leon Fauster
 
Default Deduplication data for CentOS?

Am 28.08.2012 um 10:03 schrieb Rainer Traut:
> Rsync is of no use for us. We have mainly big Domino .nsf files which
> only change slightly. So rsync would not be able to make many hardlinks.


can this endeavor ensure the consistence of this "database" files?

--
LF



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 06:41 PM
Les Mikesell
 
Default Deduplication data for CentOS?

On Tue, Aug 28, 2012 at 3:03 AM, Rainer Traut <tr.ml@gmx.de> wrote:
>>
> Rsync is of no use for us. We have mainly big Domino .nsf files which
> only change slightly. So rsync would not be able to make many hardlinks.

Rdiff-backup might work for this since it stores deltas. Are you
doing something to snapshot the filesystem during the copy or are
these just growing logs where consistency doesn't matter?

I'd probably look at freebsd with zfs on a machine with a boatload of
ram if I needed dedup in the filesystem right now. Or put together
some scripts that would copyand split the large files to chunks in a
directory and let backuppc take it from there.


--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 07:15 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org