FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Development

 
 
LinkBack Thread Tools
 
Old 02-14-2011, 02:24 PM
Ian Jackson
 
Default OT: Python (was: Make Unicode bugs release critical?)

Jakub Wilk writes ("Re: OT: Python (was: Make Unicode bugs release critical?)"):
> * Klaus Ethgen <Klaus@Ethgen.de>, 2011-02-14, 14:37:
> >~> LC_CTYPE=en_GB.utf-8 perl -e 'print "x{00a3}
";'
> >~> LC_CTYPE=en_GB.utf-8 perl -e 'print "x{00a3}
";' | cat
>
> Let me try...
>
> $ LC_CTYPE=en_GB.utf-8 perl -e 'print "x{00a3}
";' | isutf8
> stdin: line 1, char 1, byte offset 1: invalid UTF-8 code

WTF. OK, Perl's out too.

We'll have to write everything in dash :-).

Ian.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 19801.18743.486394.290910@chiark.greenend.org.uk"> http://lists.debian.org/19801.18743.486394.290910@chiark.greenend.org.uk
 
Old 02-14-2011, 03:21 PM
Klaus Ethgen
 
Default OT: Python (was: Make Unicode bugs release critical?)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Am Mo den 14. Feb 2011 um 16:24 schrieb Ian Jackson:
> Jakub Wilk writes ("Re: OT: Python (was: Make Unicode bugs release critical?)"):
> > * Klaus Ethgen <Klaus@Ethgen.de>, 2011-02-14, 14:37:
> > >~> LC_CTYPE=en_GB.utf-8 perl -e 'print "x{00a3}
";'
> > >~> LC_CTYPE=en_GB.utf-8 perl -e 'print "x{00a3}
";' | cat
> >
> > Let me try...
> >
> > $ LC_CTYPE=en_GB.utf-8 perl -e 'print "x{00a3}
";' | isutf8
> > stdin: line 1, char 1, byte offset 1: invalid UTF-8 code
>
> WTF. OK, Perl's out too.

No, it is not. 00a3 is just not a utf-8 character, it is unicode. To get
a correct utf-8 character you need to print x{c2a3} and then isutf8 is
happy.

> We'll have to write everything in dash :-).

lisp. :-)

But now we get complete out of topic.

Regards
Klaus
- --
Klaus Ethgen http://www.ethgen.ch/
pub 2048R/D1A4EDE5 2000-02-26 Klaus Ethgen <Klaus@Ethgen.de>
Fingerprint: D7 67 71 C4 99 A6 D4 FE EA 40 30 57 3C 88 26 2B
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEVAwUBTVlWk5+OKpjRpO3lAQohXgf9FC839X5Pozj2LZUJKd +X9Bcy5F/q+zWg
cdPlFkRL2BSq05M4+V8anb6vP47JdMMJfgc1oszNWZkYOQkgZd Ty1GdCVF9o0jpD
xSlA7MVBt7ijTtfOlodzZiO6PyXPx7vo6AJGUufwb4KxekLR6v Kq9fzlTLvvD/mH
lPPbCuZrY90eWqRjFeLyXA6Cmx+cJG5jt8nAAOzBjWTuENNp+v TFx1Lad13que7T
AAXrQupjCpRwAxfN8cuYMMIAFw5FCOyTQNAZXaAeMV1UOslVVd XlffUDB6uqpNvC
JPPL9PhughLVWtSxsm74emFCVkBQ75xTGMJTbCUCfMmdwTj3mD 7uLw==
=J1JB
-----END PGP SIGNATURE-----


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110214162139.GF6167@ikki.ethgen.ch">http://lists.debian.org/20110214162139.GF6167@ikki.ethgen.ch
 
Old 02-14-2011, 03:43 PM
Ian Jackson
 
Default OT: Python (was: Make Unicode bugs release critical?)

Klaus Ethgen writes ("Re: OT: Python (was: Make Unicode bugs release critical?)"):
> No, it is not. 00a3 is just not a utf-8 character, it is unicode. To get
> a correct utf-8 character you need to print x{c2a3} and then isutf8 is
> happy.

When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
characters to stdout should use UTF-8. That's what LC_TYPE means.

Ian.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 19801.23455.536473.211939@chiark.greenend.org.uk"> http://lists.debian.org/19801.23455.536473.211939@chiark.greenend.org.uk
 
Old 02-14-2011, 04:36 PM
Konstantin Khomoutov
 
Default OT: Python (was: Make Unicode bugs release critical?)

On Mon, 14 Feb 2011 16:43:11 +0000
Ian Jackson <ijackson@chiark.greenend.org.uk> wrote:

> Klaus Ethgen writes ("Re: OT: Python (was: Make Unicode bugs release
> critical?)"):
> > No, it is not. 00a3 is just not a utf-8 character, it is unicode.
> > To get a correct utf-8 character you need to print x{c2a3} and
> > then isutf8 is happy.
>
> When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> characters to stdout should use UTF-8. That's what LC_TYPE means.

By the way,

$ LC_CTYPE=en_GB.utf-8 echo 'puts x00a3
'|tclsh|isutf8
$
$ LC_CTYPE=en_GB.utf-8 echo 'puts x00a3
'|tclsh|xxd -p
c2a30a0a
$

But RMS told the world not to use Tcl.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110214203601.715df57c.kostix@domain007.com">http ://lists.debian.org/20110214203601.715df57c.kostix@domain007.com
 
Old 02-15-2011, 11:01 PM
Vincent Lefevre
 
Default OT: Python (was: Make Unicode bugs release critical?)

On 2011-02-14 16:43:11 +0000, Ian Jackson wrote:
> When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> characters to stdout should use UTF-8. That's what LC_TYPE means.

So, "cat", "grep", etc. are all broken.

--
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110216000107.GL15920@prunille.vinc17.org">http://lists.debian.org/20110216000107.GL15920@prunille.vinc17.org
 
Old 02-15-2011, 11:34 PM
Adam Borowski
 
Default OT: Python (was: Make Unicode bugs release critical?)

On Wed, Feb 16, 2011 at 01:01:07AM +0100, Vincent Lefevre wrote:
> On 2011-02-14 16:43:11 +0000, Ian Jackson wrote:
> > When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> > characters to stdout should use UTF-8. That's what LC_TYPE means.
>
> So, "cat", "grep", etc. are all broken.

How come?

"cat" will, for any valid UTF-8 character on input, print a valid UTF-8
character on output. For any valid ISO-8859-1 character on input, it will
print a valid ISO-8859-1 character on output.

"grep" on the other hand has to actually understand the encoding -- and it
does. Try this:
$ echo "Ä…"|LC_CTYPE=C grep --color=always .
Will be mangled.
$ echo "Ä…"|LC_CTYPE=en_US.utf-8 grep --color=always .
Will be handled correctly.

--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110216003451.GA14646@angband.pl">http://lists.debian.org/20110216003451.GA14646@angband.pl
 
Old 02-15-2011, 11:45 PM
Vincent Lefevre
 
Default OT: Python (was: Make Unicode bugs release critical?)

On 2011-02-16 01:34:51 +0100, Adam Borowski wrote:
> On Wed, Feb 16, 2011 at 01:01:07AM +0100, Vincent Lefevre wrote:
> > On 2011-02-14 16:43:11 +0000, Ian Jackson wrote:
> > > When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> > > characters to stdout should use UTF-8. That's what LC_TYPE means.
> >
> > So, "cat", "grep", etc. are all broken.
>
> How come?
>
> "cat" will, for any valid UTF-8 character on input, print a valid UTF-8
> character on output. For any valid ISO-8859-1 character on input, it will
> print a valid ISO-8859-1 character on output.

I was just commenting what Ian said. If there is a valid reason for
which "cat" may not produce UTF-8 in UTF-8 locales, this is also
true for "perl" or any other software.

--
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110216004529.GN15920@prunille.vinc17.org">http://lists.debian.org/20110216004529.GN15920@prunille.vinc17.org
 

Thread Tools




All times are GMT. The time now is 02:02 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org