Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Development (http://www.linux-archive.org/debian-development/)
-   -   Use language determination tool for SPAM prevention (Was: Spell checker as reasonable SPAM prevention tool) (http://www.linux-archive.org/debian-development/488074-use-language-determination-tool-spam-prevention-spell-checker-reasonable-spam-prevention-tool.html)

Andreas Tille 02-11-2011 01:38 PM

Use language determination tool for SPAM prevention (Was: Spell checker as reasonable SPAM prevention tool)
 
On Fri, Feb 11, 2011 at 02:27:03PM +0000, brian m. carlson wrote:
>
> I've been thinking about this some as well for my personal domain.
> Debian has tools that can determine the language of a document
> (libtextcat and friends).

So this is even better.

> Emails that are 70% or more composed of
> languages that I have no hope of speaking or understanding (i.e.,
> everything but English, Spanish, French, and Portuguese) would be
> rejected. I chose 70% as the threshold because sometimes Debian lists
> get mails from users in both English and another language (in hopes of
> being understood) and I wouldn't want to penalize those users. I
> haven't implemented this, but I might at some point.

Publishing the implementation would be cool.

> Obviously, this would have to be adjusted per-list;

This is for sure obvious and that's why I did not mention this. We have
a default language per list which makes for sure a need for configurable
filtering per list - but this should be easy enough if we get it
implemented at all.

> we wouldn't want to
> reject German-language emails to debian-user-german. I also think
> language testing is better than spell checking for English because
> honestly English has a lot of pretty irregular and bizarre spellings; I
> say this as someone whose native language is English and who spells
> fairly decently. A spell checker might catch more legitimate emails
> than we'd like.

My shot at the spell checker was just to detect a language - it might
perfectly be that we have better tools than a spell checker to detect a
language in which an e-mail is written in which makes the implementation
of the suggestion probably easier.

Kind regards

Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211143843.GG9160@an3as.eu">http://lists.debian.org/20110211143843.GG9160@an3as.eu


All times are GMT. The time now is 05:09 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.