FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 08-28-2012, 07:04 PM
John R Pierce
 
Default Deduplication data for CentOS?

On 08/28/12 11:41 AM, Les Mikesell wrote:
> On Tue, Aug 28, 2012 at 3:03 AM, Rainer Traut<tr.ml@gmx.de> wrote:
>>> >>
>> >Rsync is of no use for us. We have mainly big Domino .nsf files which
>> >only change slightly. So rsync would not be able to make many hardlinks.
> Rdiff-backup might work for this since it stores deltas. Are you
> doing something to snapshot the filesystem during the copy or are
> these just growing logs where consistency doesn't matter?

NSF files are a proprietary database format used by Lotus Notes and
Domino, very complex, there's a pile of versions, and they are totally
opaque. Pretty sure that if they are being accessed or updated while
being copied the copy is invalid, so yes, some form of snapshotting is
required.

commercial backup software uses Domino/Notes APIs to do incremental
backups, for example
http://www.symantec.com/business/support/index?page=content&id=TECH46513



--
john r pierce N 37, W 122
santa cruz ca mid-left coast

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-28-2012, 07:26 PM
Les Mikesell
 
Default Deduplication data for CentOS?

On Tue, Aug 28, 2012 at 2:04 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 08/28/12 11:41 AM, Les Mikesell wrote:
>> On Tue, Aug 28, 2012 at 3:03 AM, Rainer Traut<tr.ml@gmx.de> wrote:
>>>> >>
>>> >Rsync is of no use for us. We have mainly big Domino .nsf files which
>>> >only change slightly. So rsync would not be able to make many hardlinks.
>> Rdiff-backup might work for this since it stores deltas. Are you
>> doing something to snapshot the filesystem during the copy or are
>> these just growing logs where consistency doesn't matter?
>
> NSF files are a proprietary database format used by Lotus Notes and
> Domino, very complex, there's a pile of versions, and they are totally
> opaque. Pretty sure that if they are being accessed or updated while
> being copied the copy is invalid, so yes, some form of snapshotting is
> required.
>
> commercial backup software uses Domino/Notes APIs to do incremental
> backups, for example
> http://www.symantec.com/business/support/index?page=content&id=TECH46513

If there is a command-line way to generate an incremental backup file,
backuppc could run it via ssh as a pre-backup command.

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-29-2012, 09:43 AM
Rainer Traut
 
Default Deduplication data for CentOS?

Am 28.08.2012 21:26, schrieb Les Mikesell:
> On Tue, Aug 28, 2012 at 2:04 PM, John R Pierce <pierce@hogranch.com> wrote:
>> On 08/28/12 11:41 AM, Les Mikesell wrote:
>>> On Tue, Aug 28, 2012 at 3:03 AM, Rainer Traut<tr.ml@gmx.de> wrote:
>>>>>>>
>>>>> Rsync is of no use for us. We have mainly big Domino .nsf files which
>>>>> only change slightly. So rsync would not be able to make many hardlinks.
>>> Rdiff-backup might work for this since it stores deltas. Are you
>>> doing something to snapshot the filesystem during the copy or are
>>> these just growing logs where consistency doesn't matter?
>>
>> NSF files are a proprietary database format used by Lotus Notes and
>> Domino, very complex, there's a pile of versions, and they are totally
>> opaque. Pretty sure that if they are being accessed or updated while
>> being copied the copy is invalid, so yes, some form of snapshotting is
>> required.
>>
>> commercial backup software uses Domino/Notes APIs to do incremental
>> backups, for example
>> http://www.symantec.com/business/support/index?page=content&id=TECH46513
>
> If there is a command-line way to generate an incremental backup file,
> backuppc could run it via ssh as a pre-backup command.
>

Yes, there is commercial software to do incremental backups but I do not
know of commandline options to do this. Maybe anyone?

Les is right, I stop the server, take the snapshot, start the server and
do the xdelta on the snapshot NSF files.
Having that minimal downtime is ok and acknowledged by the customer.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 08-29-2012, 11:45 AM
John R Pierce
 
Default Deduplication data for CentOS?

On 08/29/12 2:43 AM, Rainer Traut wrote:
> Yes, there is commercial software to do incremental backups but I do not
> know of commandline options to do this. Maybe anyone?
>
> Les is right, I stop the server, take the snapshot, start the server and
> do the xdelta on the snapshot NSF files.
> Having that minimal downtime is ok and acknowledged by the customer.

I found some more stuff on a IBM site talking about the API (has to be
called from software, not command line) to generate and keep track of
transaction log files which the backup software archives. nothing about
de-dup.



--
john r pierce N 37, W 122
santa cruz ca mid-left coast

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 09-12-2012, 04:01 AM
Bob Hepple
 
Default Deduplication data for CentOS?

Rainer Traut <tr.ml@...> writes:

>
> Hi list,
>
> is there any working solution for deduplication of data for centos?
> We are trying to find a solution for our backup server which runs a bash
> script invoking xdelta(3). But having this functionality in fs is much
> more friendly...
>
> We have looked into lessfs, sdfs and ddar.
> Are these filesystems ready to use (on centos)?
> ddar is sthg different, I know.
>
> Thx
> Rainer
>


Not sure if it's already been mentioned but storeBackup uses rsync and hardlinks
to minimise storage - and it break up big files and backs up the fragments
separately. May help ...
http://www.nongnu.org/storebackup/en/node2.html

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 09-13-2012, 05:06 PM
Ryan Palamara
 
Default Deduplication data for CentOS?

The better option for ZFS would be to get a SSD and move the dedupe table onto that drive instead of having it in RAM, because it can become massive.

Thank you,

Ryan Palamara
ZAIS Group, LLC
2 Bridge Avenue, Suite 322
Red Bank, New Jersey 07701
Phone: (732) 450-7444
Ryan.palamara@zaisgroup.com


-----Original Message-----
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Dean Jones
Sent: Monday, August 27, 2012 11:45 AM
To: CentOS mailing list
Subject: Re: [CentOS] Deduplication data for CentOS?

Deduplication with ZFS takes a lot of RAM.

I would not yet trust any of the linux zfs projects for data that I
wanted to keep long term.

On Mon, Aug 27, 2012 at 8:26 AM, Les Mikesell <lesmikesell@gmail.com> wrote:
> On Mon, Aug 27, 2012 at 9:23 AM, John R Pierce <pierce@hogranch.com> wrote:
>> On 08/27/12 4:55 AM, Rainer Traut wrote:
>>> is there any working solution for deduplication of data for centos?
>>> We are trying to find a solution for our backup server which runs a bash
>>> script invoking xdelta(3). But having this functionality in fs is much
>>> more friendly...
>>
>> BackupPC does exactly this. its not a generalized solution to
>> deduplication of a file system, instead, its a backup system, designed
>> to backup multiple targets, that implements deduplication on the backup
>> tree it maintains.
>
> Not _exactly_, but maybe close enough and it is very easy to install
> and try. Backuppc will use rsync for transfers and thus only uses
> bandwidth for the differences, but it uses hardlinks to files to dedup
> the storage. It will find and link duplicate content even from
> different sources, but the complete file must be identical. It does
> not store deltas, so large files that change even slightly between
> backups end up stored as complete copies (with optional compression).
>
> --
> Les Mikesell
> lesmikesell@gmail.com
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
________________________________


This e-mail message is intended only for the named recipient(s) above. It may contain confidential information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.

This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. This is not research and is not from ZAIS Group but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of ZAIS Group research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns.

IRS CIRCULAR 230 NOTICE:.

To comply with requirements imposed by the IRS, we inform you that any U.S. federal tax advice contained herein (including any attachments), unless specifically stated otherwise, is not intended or written to be used, and cannot be used, for the purpose of (i) avoiding penalties under the Internal Revenue Code or (ii) promoting, marketing or recommending any transaction or matter addressed herein to another party. Each taxpayer should seek advice based on the taxpayer's particular circumstances from an independent tax advisor.

"ZAIS", "ZAIS Group" and "ZAIS Solutions" are trademarks of ZAIS Group, LLC.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 09-13-2012, 07:08 PM
Les Mikesell
 
Default Deduplication data for CentOS?

On Thu, Sep 13, 2012 at 12:06 PM, Ryan Palamara
<Ryan.Palamara@zaisgroup.com> wrote:
> The better option for ZFS would be to get a SSD and move the dedupe table onto that drive instead of having it in RAM, because it can become massive.

What's 'massive' in dollars these days?

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 09-13-2012, 07:47 PM
Ryan Palamara
 
Default Deduplication data for CentOS?

It depends on size of the data that you are storing and the block size. Here is a good primer on it: http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe

As a quick estimate, about 5GB per 1TB or storage for SSD. However I believe that you would need even more RAM since only a 1/4 of the RAM will be used for the dedupe table with ZFS.

Thank you,

Ryan Palamara
ZAIS Group, LLC
2 Bridge Avenue, Suite 322
Red Bank, New Jersey 07701
Phone: (732) 450-7444
Ryan.palamara@zaisgroup.com


-----Original Message-----
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Les Mikesell
Sent: Thursday, September 13, 2012 3:09 PM
To: CentOS mailing list
Subject: Re: [CentOS] Deduplication data for CentOS?

On Thu, Sep 13, 2012 at 12:06 PM, Ryan Palamara <Ryan.Palamara@zaisgroup.com> wrote:
> The better option for ZFS would be to get a SSD and move the dedupe table onto that drive instead of having it in RAM, because it can become massive.

What's 'massive' in dollars these days?

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
________________________________


This e-mail message is intended only for the named recipient(s) above. It may contain confidential information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.

This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. This is not research and is not from ZAIS Group but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of ZAIS Group research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns.

IRS CIRCULAR 230 NOTICE:.

To comply with requirements imposed by the IRS, we inform you that any U.S. federal tax advice contained herein (including any attachments), unless specifically stated otherwise, is not intended or written to be used, and cannot be used, for the purpose of (i) avoiding penalties under the Internal Revenue Code or (ii) promoting, marketing or recommending any transaction or matter addressed herein to another party. Each taxpayer should seek advice based on the taxpayer's particular circumstances from an independent tax advisor.

"ZAIS", "ZAIS Group" and "ZAIS Solutions" are trademarks of ZAIS Group, LLC.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-01-2012, 08:35 PM
joel billy
 
Default Deduplication data for CentOS?

At our shop we have used quadstor - http://www.quadstor.com with good
amount of success. But our use is specifically for vmware environments
over a SAN. However it is possible (i have tried this a couple of
times) to use the quadstor virtual disks as a local block device,
format it with ext4 or btrfs etc. and get the benefits of
deduplication, compression etc. Yes btrfs deduplication is possible
:-), i have tried it.
You might need to check on the memory requirements for NAS/local
filesystems. We use 8 GB in our SAN box and so far things are fine.

- jb

Rainer Traut <tr.ml@...> writes:

>
> Hi list,
>
> is there any working solution for deduplication of data for centos?
> We are trying to find a solution for our backup server which runs a bash
> script invoking xdelta(3). But having this functionality in fs is much
> more friendly...
>
> We have looked into lessfs, sdfs and ddar.
> Are these filesystems ready to use (on centos)?
> ddar is sthg different, I know.
>
> Thx
> Rainer
>
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 09:28 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org