FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 02-03-2010, 03:56 AM
Rajagopal Swaminathan
 
Default Unicode related query

Greetings,

I am able to get a english word list in <file> by using the following command

cat <file> | tr -sc A-Za-z '12'

My question is how to specify unicode character and ASCII.
Specifically text text file containing 3 byte sequence starting with
x0e in the tr command.

I am able to see the character using:

echo -e 'xe0xa5xbf'

What regex incantation would make tr give the results I want?

I am new to unicode.

Regards,

Rajagopal
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-03-2010, 04:33 AM
"Joseph L. Casale"
 
Default Unicode related query

>I am able to get a english word list in <file> by using the following command
>
>cat <file> | tr -sc A-Za-z '12'
>
>My question is how to specify unicode character and ASCII.
>Specifically text text file containing 3 byte sequence starting with
>x0e in the tr command.
>
>I am able to see the character using:
>
>echo -e 'xe0xa5xbf'
>
>What regex incantation would make tr give the results I want?
>
>I am new to unicode.

You don't say much as to what bounds the words, spaces? Give more info, but
http://www.regular-expressions.info/unicode.html leads to some Perl solutions.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-03-2010, 04:41 AM
Rajagopal Swaminathan
 
Default Unicode related query

Greetings,

On Wed, Feb 3, 2010 at 11:03 AM, Joseph L. Casale
<jcasale@activenetwerx.com> wrote:
>
> You don't say much as to what bounds the words, spaces? Give more info, but
> http://www.regular-expressions.info/unicode.html leads to some Perl solutions.

Thanks for the quick reply.

I have started perusing it.

Perl is currently martian to me . Hope to gain fluency in that in
the very near future.

The said unicode strings (with multi-byte "points") may be bound by
comma, single quotes, space etc. I am ready to sacrifice all
characters except the [:alpha:] and unicode strings.

Thanks again and Regards,

Rajagopal
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 02:36 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org