FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor


 
 
LinkBack Thread Tools
 
Old 02-14-2011, 08:11 PM
Russ Allbery
 
Default OT: Python

Ian Jackson <ijackson@chiark.greenend.org.uk> writes:
> Klaus Ethgen writes:

>> No, it is not. 00a3 is just not a utf-8 character, it is unicode. To
>> get a correct utf-8 character you need to print x{c2a3} and then
>> isutf8 is happy.

> When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> characters to stdout should use UTF-8. That's what LC_TYPE means.

Perl is specifically documented to not do this for backward compatibility
reasons. In Perl, which is the one I know best, you are required to
decode input and encode output if you want to have UTF-8 handling.

windlord:~> env LC_CTYPE=en_US.UTF-8 perl -e 'print "x{00a3}
"'
<glyph for mangled Unicode character>
windlord:~> env LC_CTYPE=en_US.UTF-8 perl -MEncode -e 'print encode("utf-8", "x{00a3}
")'
<proper Unicode pound sign>

See perlunicode(1). There are a variety of reasons for this that turn out
to be fairly good ones if you don't want to badly break a bunch of
existing Perl scripts that were dealing with, for example, binary data.

--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87lj1ijp93.fsf@windlord.stanford.edu">http://lists.debian.org/87lj1ijp93.fsf@windlord.stanford.edu
 
Old 02-15-2011, 11:09 PM
Vincent Lefevre
 
Default OT: Python

On 2011-02-14 13:11:04 -0800, Russ Allbery wrote:
> Perl is specifically documented to not do this for backward compatibility
> reasons. In Perl, which is the one I know best, you are required to
> decode input and encode output if you want to have UTF-8 handling.

Or better, use the -C option.

perl -C -e 'print "x{00a3}
"'

will "work" under both UTF-8 and ISO-8859-1. Or you can force UTF-8 with:

perl -CSD -e 'print "x{00a3}
"'

You can also do that globally with the PERL_UNICODE environment variable.

--
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110216000924.GM15920@prunille.vinc17.org">http://lists.debian.org/20110216000924.GM15920@prunille.vinc17.org
 

Thread Tools




All times are GMT. The time now is 07:34 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org