Hadoop in Debian, was: Hardware trouble ries.debian.org
Joerg Jaspert:
<SNIP>
> The only trouble this setup has is that you have a pretty huge expensive
> machine always on and running, but not actually doing stuff for
> 99.999999999999% of the time.
</SNIP>
Hadoop is now in Debian: http://packages.qa.debian.org/h/hadoop.html
Hadoop is an Open Source implementation of Google's File System, MapReduce and
BigTable (HBase, not yet packaged).
The idea behind Google's infrastructure and therefor Hadoop is: Have many
cheap comodity servers that together form a powerful cluster. Each node of the
cluster is redundant and can be replaced without downtime.
I believe, but can't know for sure, that everything what FTP-Master does,
could be implemented on top of hadoop.
However it means for sure a lot of work and many hardcore sysadmins will feel
very uncomfortable to use Java, the language Hadoop is written in.
I'm planning to give a presentation of hadoop at the DebConf in Bosnia and
maybe then we may discuss, if hadoop should have a place in Debian's
infrastructure. - For now I'm happy, if somebody became curious. :-)
http://en.wikipedia.org/wiki/Hadoop
Best regards,
Thomas Koch, http://www.koch.ro
--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201003311442.09336.thomas@koch.ro">http://lists.debian.org/201003311442.09336.thomas@koch.ro
03-31-2010, 04:20 PM
Obey Arthur Liu
Hadoop in Debian, was: Hardware trouble ries.debian.org
On Wed, Mar 31, 2010 at 2:42 PM, Thomas Koch <thomas@koch.ro> wrote:
> Joerg Jaspert:
> <SNIP>
>> The only trouble this setup has is that you have a pretty huge expensive
>> machine always on and running, but not actually doing stuff for
>> 99.999999999999% of the time.
> </SNIP>
>
> Hadoop is now in Debian: http://packages.qa.debian.org/h/hadoop.html
> Hadoop is an Open Source implementation of Google's File System, MapReduce and
> BigTable (HBase, not yet packaged).
>
> The idea behind Google's infrastructure and therefor Hadoop is: Have many
> cheap comodity servers that together form a powerful cluster. Each node of the
> cluster is redundant and can be replaced without downtime.
>
> I believe, but can't know for sure, that everything what FTP-Master does,
> could be implemented on top of hadoop.
> However it means for sure a lot of work and many hardcore sysadmins will feel
> very uncomfortable to use Java, the language Hadoop is written in.
Isn't there /some/ python/jython support ?
Would you co-mentor such a project as part of a Summer of Code project
? Do you know someone who would ?
It need not be ftpmaster. There are probably other critical debian
infrastructure which could use this.
> I'm planning to give a presentation of hadoop at the DebConf in Bosnia and
> maybe then we may discuss, if hadoop should have a place in Debian's
> infrastructure. - For now I'm happy, if somebody became curious. :-)
>
> http://en.wikipedia.org/wiki/Hadoop
>
> Best regards,
>
> Thomas Koch, http://www.koch.ro
Cheers
Arthur
--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: g2vc09ddae71003310920icba0397au58fa97f4ebfb788a@ma il.gmail.com">http://lists.debian.org/g2vc09ddae71003310920icba0397au58fa97f4ebfb788a@ma il.gmail.com
03-31-2010, 10:41 PM
Stephen Gran
Hadoop in Debian, was: Hardware trouble ries.debian.org
This one time, at band camp, Obey Arthur Liu said:
> On Wed, Mar 31, 2010 at 2:42 PM, Thomas Koch <thomas@koch.ro> wrote:
> >
> > I believe, but can't know for sure, that everything what FTP-Master does,
> > could be implemented on top of hadoop.
> > However it means for sure a lot of work and many hardcore sysadmins will feel
> > very uncomfortable to use Java, the language Hadoop is written in.
>
> Isn't there /some/ python/jython support ?
>
> Would you co-mentor such a project as part of a Summer of Code project
> ? Do you know someone who would ?
> It need not be ftpmaster. There are probably other critical debian
> infrastructure which could use this.
Hadoop is not a POSIX file system, as far as I'm aware. As ftp-master
makes heavy use of things like file locks and hard links, I doubt hadoop
would work without a significant rewrite of the software.
It would probably be helpful to take a look at the dak codebase before
coming up with other solutions to this - any sort of clustering has to
take the software that actually runs the archive into account.
Cheers,
--
-----------------------------------------------------------------
| ,'`. Stephen Gran |
| : :' : sgran@debian.org |
| `. `' Debian user, admin, and developer |
| `- http://www.debian.org |
-----------------------------------------------------------------
--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100331224108.GA9763@varinia.lobefin.net">http://lists.debian.org/20100331224108.GA9763@varinia.lobefin.net
04-01-2010, 10:40 AM
Bernd Eckenfels
Hadoop in Debian, was: Hardware trouble ries.debian.org
In article <20100331224108.GA9763@varinia.lobefin.net> you wrote:
> Hadoop is not a POSIX file system, as far as I'm aware. As ftp-master
> makes heavy use of things like file locks and hard links, I doubt hadoop
> would work without a significant rewrite of the software.
And HDFS is optimized for very large files, only. You would have to build a
filesystem inside for the typical FTP case - or maybe use HBase, not sure if
it can store large enough blobs.
Gruss
Bernd
--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201004011040.o31AewYr097932@neskaya.eckenfels.net" >http://lists.debian.org/201004011040.o31AewYr097932@neskaya.eckenfels.net