FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Development

 
 
LinkBack Thread Tools
 
Old 02-14-2011, 08:35 AM
Josselin Mouette
 
Default Make Unicode bugs release critical? (was: RFA: all my packages)

Le vendredi 11 février 2011 * 19:33 +0100, Axel Beckert a écrit :
> Kicking out good and unique software, only because of missing or
> incomplete UTF-8 support, will surely lower Debian's quality more than
> missing or broken UTF-8 support in very few packages. And it would
> make those users (and devs) angry who need that software independently
> of working UTF-8 support or not.

Kicking out software with incomplete UTF-8 support sounds unfair.

Kicking out software that doesn’t work at all in UTF-8 locales and
requires the user to set a broken locale, OTOH, sounds like a sanitary
emergency.

--
.'`.
: :' : “You would need to ask a lawyer if you don't know
`. `' that a handshake of course makes a valid contract.”
`- -- J???rg Schilling


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1297676104.3044.218.camel@meh">http://lists.debian.org/1297676104.3044.218.camel@meh
 
Old 02-14-2011, 11:42 AM
Ian Jackson
 
Default Make Unicode bugs release critical? (was: RFA: all my packages)

Josselin Mouette writes ("Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)"):
> Kicking out software that doesn?t work at all in UTF-8 locales and
> requires the user to set a broken locale, OTOH, sounds like a sanitary
> emergency.

Excellent, I look forward to the removal of python. I always hated
that language anyway.

$ LC_CTYPE=en_GB.utf-8 python -c 'print u"u00a3"'
<unicode pound sign>
$

But

$ LC_CTYPE=en_GB.utf-8 python -c 'print u"u00a3"' | cat
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xa3' in
position 0: ordinal not in range(128)
$

Ian.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 19801.8997.829350.140559@chiark.greenend.org.uk">h ttp://lists.debian.org/19801.8997.829350.140559@chiark.greenend.org.uk
 
Old 02-14-2011, 12:14 PM
Jakub Wilk
 
Default Make Unicode bugs release critical? (was: RFA: all my packages)

* Ian Jackson <ijackson@chiark.greenend.org.uk>, 2011-02-14, 12:42:
Kicking out software that doesn?t work at all in UTF-8 locales and
requires the user to set a broken locale, OTOH, sounds like a sanitary
emergency.


Excellent, I look forward to the removal of python. I always hated
that language anyway.


$ LC_CTYPE=en_GB.utf-8 python -c 'print u"u00a3"'
<unicode pound sign>
$

But

$ LC_CTYPE=en_GB.utf-8 python -c 'print u"u00a3"' | cat
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xa3' in
position 0: ordinal not in range(128)
$


This is the expected behaviour. Incidentally, it has nothing to do with
UTF-8. You'll get the same result if you use a locale with a legacy
encoding.


--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110214131425.GA4744@jwilk.net">http://lists.debian.org/20110214131425.GA4744@jwilk.net
 
Old 02-14-2011, 02:46 PM
Josselin Mouette
 
Default Make Unicode bugs release critical? (was: RFA: all my packages)

Le lundi 14 février 2011 * 12:42 +0000, Ian Jackson a écrit :
> Josselin Mouette writes ("Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)"):
> > Kicking out software that doesn?t work at all in UTF-8 locales and
> > requires the user to set a broken locale, OTOH, sounds like a sanitary
> > emergency.
>
> Excellent, I look forward to the removal of python. I always hated
> that language anyway.

From your reply I look more forward to the removal of vm, since it broke
the Unicode in my original email.

> $ LC_CTYPE=en_GB.utf-8 python -c 'print u"u00a3"' | cat
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'xa3' in
> position 0: ordinal not in range(128)
> $


You must specify the encoding of your data in your bitstreams. I agree
this is inconvenient (and one of the things I dislike in Python), but it
is:
1. completely independent of the locale (UTF8 or not)
2. easy to work with once you understand how encodings in Python
work
3. much better in Python 3.

--
.'`.
: :' : “You would need to ask a lawyer if you don't know
`. `' that a handshake of course makes a valid contract.”
`- -- J???rg Schilling


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1297698390.8791.72.camel@meh">http://lists.debian.org/1297698390.8791.72.camel@meh
 
Old 02-14-2011, 03:01 PM
Henrique de Moraes Holschuh
 
Default Make Unicode bugs release critical? (was: RFA: all my packages)

On Mon, 14 Feb 2011, Josselin Mouette wrote:
> You must specify the encoding of your data in your bitstreams. I agree
> this is inconvenient (and one of the things I dislike in Python), but it
> is:
> 1. completely independent of the locale (UTF8 or not)
> 2. easy to work with once you understand how encodings in Python
> work
> 3. much better in Python 3.

As long as python 3 is compiled to use UCS-4 as the internal
representation, you mean. Are our packages set to use UCS-4?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110214160108.GA7545@khazad-dum.debian.net">http://lists.debian.org/20110214160108.GA7545@khazad-dum.debian.net
 
Old 02-14-2011, 03:20 PM
"brian m. carlson"
 
Default Make Unicode bugs release critical? (was: RFA: all my packages)

On Mon, Feb 14, 2011 at 02:01:08PM -0200, Henrique de Moraes Holschuh wrote:
> As long as python 3 is compiled to use UCS-4 as the internal
> representation, you mean. Are our packages set to use UCS-4?

At least for python 3.1, yes:

common_configure_args =
--prefix=/usr
--enable-ipv6
--with-dbmliborder=bdb
--with-wide-unicode
--with-computed-gotos
--with-system-expat

The --with-wide-unicode enables UCS-4. With a very few exceptions, I
believe all the recent Debian python packages have been compiled this
way.

--
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
 
Old 02-14-2011, 03:39 PM
Ian Jackson
 
Default Make Unicode bugs release critical? (was: RFA: all my packages)

Josselin Mouette writes ("Re: Make Unicode bugs release critical? (was: Re: RFA: all my packages)"):
> Le lundi 14 fvrier 2011 12:42 +0000, Ian Jackson a crit :
> > Excellent, I look forward to the removal of python. I always hated
> > that language anyway.
>
> From your reply I look more forward to the removal of vm, since it broke
> the Unicode in my original email.

In fact I manually typed "<unicode pound sign>" and deliberately
avoided putting any non-ASCII in my email, to avoid things being even
more confused.

But you are making my argument for me: lots of software has
unicode handling bugs. If we make them all release critical we might
as well give up and go home.


Regarding the specifics, which we don't really need to go into too
much detail about:

> > $ LC_CTYPE=en_GB.utf-8 python -c 'print u"u00a3"' | cat
> > Traceback (most recent call last):
> > File "<string>", line 1, in <module>
> > UnicodeEncodeError: 'ascii' codec can't encode character u'xa3' in
> > position 0: ordinal not in range(128)
> > $
>
> You must specify the encoding of your data in your bitstreams. I agree
> this is inconvenient (and one of the things I dislike in Python), but it
> is:
> 1. completely independent of the locale (UTF8 or not)
> 2. easy to work with once you understand how encodings in Python
> work

The fact that naive Python programs work (honouring LC_CTYPE as they
should) unless you pipe their output to something is clearly a bug.
The fact that it's a specification bug doesn't mean it's not a bug.

Non-naive programs contain something like the snippet below, which I
include so people who find this thread know that there is an answer.

> 3. much better in Python 3.

Yes, it's fixed in Python 3.

Ian.



# For fuck's sake!
import codecs
import locale
def fix_stdout():
sys.stdout = codecs.EncodedFile(sys.stdout, locale.getpreferredencoding())
def null_decode(input, errors='strict'):
return input, len(input)
sys.stdout.decode = null_decode
# From
# http://ewx.livejournal.com/457086.html?thread=3016574
# http://ewx.livejournal.com/457086.html?thread=3016574
# lightly modified.
# See also Debian #415968.
fix_stdout()


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 19801.23224.917305.259504@chiark.greenend.org.uk"> http://lists.debian.org/19801.23224.917305.259504@chiark.greenend.org.uk
 

Thread Tools




All times are GMT. The time now is 01:06 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org