FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 10-01-2008, 11:48 AM
Sven Joachim
 
Default updatedb for very large filesystems

On 2008-10-01 13:15 +0200, Mag Gam wrote:

> I was wondering if its possible to run updatedb on a very large
> filesystem (6 TB). Has anyone done this before?

I don't have such luxurious filesystems, but it should certainly be
possible. It's just a matter of time (the number of files is what
really counts, not the size of the filesystem).

> I plan on running this
> on a weekly basis, but I was wondering if updatedb was faster than a
> simple 'find'. Are there any optimizations in 'updatedb' ?

The implementation of updatedb mlocate package has an important
optimization, it reuses old entries for directories whose mtime didn't
change since the last run. So the second and subsequent runs should be
considerably faster than the first. Note, however, that mlocate is not
available in Etch.

The findutils version may be too slow to be run on a weekly basis on
systems with many millions of files.

Sven


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-01-2008, 11:53 AM
"Mag Gam"
 
Default updatedb for very large filesystems

Thanks Sven. Is it possible to get file user owner and file size with
the mlocate/updatedb ?

I would like to get granular reports like that...


On Wed, Oct 1, 2008 at 7:48 AM, Sven Joachim <svenjoac@gmx.de> wrote:
> On 2008-10-01 13:15 +0200, Mag Gam wrote:
>
>> I was wondering if its possible to run updatedb on a very large
>> filesystem (6 TB). Has anyone done this before?
>
> I don't have such luxurious filesystems, but it should certainly be
> possible. It's just a matter of time (the number of files is what
> really counts, not the size of the filesystem).
>
>> I plan on running this
>> on a weekly basis, but I was wondering if updatedb was faster than a
>> simple 'find'. Are there any optimizations in 'updatedb' ?
>
> The implementation of updatedb mlocate package has an important
> optimization, it reuses old entries for directories whose mtime didn't
> change since the last run. So the second and subsequent runs should be
> considerably faster than the first. Note, however, that mlocate is not
> available in Etch.
>
> The findutils version may be too slow to be run on a weekly basis on
> systems with many millions of files.
>
> Sven
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-01-2008, 01:38 PM
Johann Spies
 
Default updatedb for very large filesystems

On Wed, Oct 01, 2008 at 07:53:05AM -0400, Mag Gam wrote:
> Thanks Sven. Is it possible to get file user owner and file size with
> the mlocate/updatedb ?
>
> I would like to get granular reports like that...

What about something like "ls -la `locate .bashrc`" where you replace
.bashrc with whatever you are looking for?

Regards
Johann
--
Johann Spies Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

"Yet if any man suffer as a Christian, let him not be
ashamed; but let him glorify God on this behalf."
I Peter 4:16


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-02-2008, 09:28 AM
"James Youngman"
 
Default updatedb for very large filesystems

On Wed, Oct 1, 2008 at 12:15 PM, Mag Gam <magawake@gmail.com> wrote:
> I was wondering if its possible to run updatedb on a very large
> filesystem (6 TB). Has anyone done this before? I plan on running this
> on a weekly basis, but I was wondering if updatedb was faster than a
> simple 'find'. Are there any optimizations in 'updatedb' ?

With findutils you can update several parts of the directory tree in
parallel, or update various parts on a different time schedule.

Here's an example with three directory trees searched in parallel with
one being searched remotely on another server and then combined with a
canned list of files from a part of the filesystem that never changes.

find /usr -print0 > /var/tmp/usr.files0 &
find /var -print0 > /var/tmp/var.files0 &
find /home -print0 > /var/tmp/home.files0 &
ssh nfs-server 'find /srv -print0' > /var/tmp/srv.files0 &
wait

sort -f -z /var/tmp/archived-stuff.files.0 /var/tmp/usr.files0
/var/tmp/var.files0 /var/tmp/home.files0 /var/tmp/srv.files0 |
/usr/lib/locate/frcode -0 > /var/tmp/locatedb.new
rm -f /var/tmp/usr.files0 /var/tmp/var.files0 /var/tmp/home.files0
/var/tmp/srv.files0

cp /var/cache/locate/locatedb /var/cache/locate/locatedb.old
mv /var/tmp/locatedb.new /var/cache/locate/locatedb


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-02-2008, 10:18 AM
Ron Johnson
 
Default updatedb for very large filesystems

On 10/02/08 04:28, James Youngman wrote:

On Wed, Oct 1, 2008 at 12:15 PM, Mag Gam <magawake@gmail.com> wrote:

I was wondering if its possible to run updatedb on a very large
filesystem (6 TB). Has anyone done this before? I plan on running this
on a weekly basis, but I was wondering if updatedb was faster than a
simple 'find'. Are there any optimizations in 'updatedb' ?


With findutils you can update several parts of the directory tree in
parallel, or update various parts on a different time schedule.

Here's an example with three directory trees searched in parallel with
one being searched remotely on another server and then combined with a
canned list of files from a part of the filesystem that never changes.

find /usr -print0 > /var/tmp/usr.files0 &
find /var -print0 > /var/tmp/var.files0 &
find /home -print0 > /var/tmp/home.files0 &
ssh nfs-server 'find /srv -print0' > /var/tmp/srv.files0 &
wait


Since find is so disk-intensive, isn't this is only of benefit if
/usr, /var and /home are on different devices?



sort -f -z /var/tmp/archived-stuff.files.0 /var/tmp/usr.files0
/var/tmp/var.files0 /var/tmp/home.files0 /var/tmp/srv.files0 |
/usr/lib/locate/frcode -0 > /var/tmp/locatedb.new
rm -f /var/tmp/usr.files0 /var/tmp/var.files0 /var/tmp/home.files0
/var/tmp/srv.files0

cp /var/cache/locate/locatedb /var/cache/locate/locatedb.old
mv /var/tmp/locatedb.new /var/cache/locate/locatedb





--
Ron Johnson, Jr.
Jefferson LA USA

"Do not bite at the bait of pleasure till you know there is no
hook beneath it." -- Thomas Jefferson


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-02-2008, 10:52 AM
"Mag Gam"
 
Default updatedb for very large filesystems

WEll, I am more interesting is searching a large Networked filesystem.



On Thu, Oct 2, 2008 at 6:18 AM, Ron Johnson <ron.l.johnson@cox.net> wrote:
> On 10/02/08 04:28, James Youngman wrote:
>>
>> On Wed, Oct 1, 2008 at 12:15 PM, Mag Gam <magawake@gmail.com> wrote:
>>>
>>> I was wondering if its possible to run updatedb on a very large
>>> filesystem (6 TB). Has anyone done this before? I plan on running this
>>> on a weekly basis, but I was wondering if updatedb was faster than a
>>> simple 'find'. Are there any optimizations in 'updatedb' ?
>>
>> With findutils you can update several parts of the directory tree in
>> parallel, or update various parts on a different time schedule.
>>
>> Here's an example with three directory trees searched in parallel with
>> one being searched remotely on another server and then combined with a
>> canned list of files from a part of the filesystem that never changes.
>>
>> find /usr -print0 > /var/tmp/usr.files0 &
>> find /var -print0 > /var/tmp/var.files0 &
>> find /home -print0 > /var/tmp/home.files0 &
>> ssh nfs-server 'find /srv -print0' > /var/tmp/srv.files0 &
>> wait
>
> Since find is so disk-intensive, isn't this is only of benefit if /usr, /var
> and /home are on different devices?
>
>> sort -f -z /var/tmp/archived-stuff.files.0 /var/tmp/usr.files0
>> /var/tmp/var.files0 /var/tmp/home.files0 /var/tmp/srv.files0 |
>> /usr/lib/locate/frcode -0 > /var/tmp/locatedb.new
>> rm -f /var/tmp/usr.files0 /var/tmp/var.files0 /var/tmp/home.files0
>> /var/tmp/srv.files0
>>
>> cp /var/cache/locate/locatedb /var/cache/locate/locatedb.old
>> mv /var/tmp/locatedb.new /var/cache/locate/locatedb
>>
>>
>
>
> --
> Ron Johnson, Jr.
> Jefferson LA USA
>
> "Do not bite at the bait of pleasure till you know there is no
> hook beneath it." -- Thomas Jefferson
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject
> of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-02-2008, 11:31 AM
Michael Mohn
 
Default updatedb for very large filesystems

Am 02.10.2008 um 12:52 schrieb Mag Gam:
On Thu, Oct 2, 2008 at 6:18 AM, Ron Johnson <ron.l.johnson@cox.net> wrote:
On 10/02/08 04:28, James Youngman wrote:

On Wed, Oct 1, 2008 at 12:15 PM, Mag Gam <magawake@gmail.com> wrote:

I was wondering if its possible to run updatedb on a very large
filesystem (6 TB). Has anyone done this before? I plan on running this
on a weekly basis, but I was wondering if updatedb was faster than a
simple 'find'. Are there any optimizations in 'updatedb' ?

With findutils you can update several parts of the directory tree in
parallel, or update various parts on a different time schedule.

Here's an example with three directory trees searched in parallel with
one being searched remotely on another server and then combined with a
canned list of files from a part of the filesystem that never changes.

find /usr -print0 *> /var/tmp/usr.files0 &
find /var *-print0 *> /var/tmp/var.files0 &
find /home -print0 > /var/tmp/home.files0 &
ssh nfs-server 'find /srv -print0' > /var/tmp/srv.files0 &
wait

Since find is so disk-intensive, isn't this is only of benefit if /usr, /var
and /home are on different devices?

sort -f -z /var/tmp/archived-stuff.files.0 /var/tmp/usr.files0
/var/tmp/var.files0 /var/tmp/home.files0 /var/tmp/srv.files0 |
/usr/lib/locate/frcode -0 > /var/tmp/locatedb.new
rm -f /var/tmp/usr.files0 /var/tmp/var.files0 /var/tmp/home.files0
/var/tmp/srv.files0

cp /var/cache/locate/locatedb /var/cache/locate/locatedb.old
mv /var/tmp/locatedb.new /var/cache/locate/locatedb





WEll, I am more interesting is searching a large Networked filesystem.



If you are looking for a search engine, i would recomend regain, which needs java and a webservice. but it does a good job.http://regain.sourceforge.net/index.php

bye,
Michael.
 
Old 10-02-2008, 01:29 PM
"James Youngman"
 
Default updatedb for very large filesystems

On Thu, Oct 2, 2008 at 11:18 AM, Ron Johnson <ron.l.johnson@cox.net> wrote:
> Since find is so disk-intensive, isn't this is only of benefit if /usr, /var
> and /home are on different devices?

Yes. Disk-head-movement optimisation will not be implemented in
findutils for another six weeks or so.

James.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-08-2008, 02:22 AM
"Mag Gam"
 
Default updatedb for very large filesystems

Great. Thanks. Basically I have 500+ directories; each directory which
has over 9000 files. I was wondering if there is a trick I can use.

TIA


On Thu, Oct 2, 2008 at 9:29 AM, James Youngman <jay@gnu.org> wrote:
> On Thu, Oct 2, 2008 at 11:18 AM, Ron Johnson <ron.l.johnson@cox.net> wrote:
>> Since find is so disk-intensive, isn't this is only of benefit if /usr, /var
>> and /home are on different devices?
>
> Yes. Disk-head-movement optimisation will not be implemented in
> findutils for another six weeks or so.
>
> James.
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 10-08-2008, 02:29 AM
Ron Johnson
 
Default updatedb for very large filesystems

I *think* that James Youngman was being sarcastic. If I'm wrong,
then so much the better.


On 10/07/08 21:22, Mag Gam wrote:

Great. Thanks. Basically I have 500+ directories; each directory which
has over 9000 files. I was wondering if there is a trick I can use.

TIA


On Thu, Oct 2, 2008 at 9:29 AM, James Youngman <jay@gnu.org> wrote:

On Thu, Oct 2, 2008 at 11:18 AM, Ron Johnson <ron.l.johnson@cox.net> wrote:

Since find is so disk-intensive, isn't this is only of benefit if /usr, /var
and /home are on different devices?

>>

Yes. Disk-head-movement optimisation will not be implemented in
findutils for another six weeks or so.


--
Ron Johnson, Jr.
Jefferson LA USA

"Do not bite at the bait of pleasure till you know there is no
hook beneath it." -- Thomas Jefferson


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 07:09 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org