FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Development

 
 
LinkBack Thread Tools
 
Old 02-11-2011, 10:59 AM
Klaus Ethgen
 
Default Make Unicode bugs release critical?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi,

Am Fr den 11. Feb 2011 um 10:37 schrieb Lars Wirzenius:
> The first Unicode standard was published in 1991. That's twenty years
> ago. Any software that processes text at all and is incapable of dealing
> with UTF-8 should be considered with extreme suspicion. Making all such
> bugs be release critical (which includes the notion that release
> managers may ignore the bug in particular cases) sounds like a good way
> to get things under control.

I think you are mixing stuff together. First there is unicode. There are
several definitions for unicode (unicode-16, unicode-32, ...) but UTF-8
is not unicode it is just one implementation of unicode and in my eyes
the most problematic as it has undefined states and is variable length.

However, UTF-8 was created to allow using unicode in non-unicode
environments. For me that was always a pointless plan and the unreadable
UTF-8 characters all around buggy software that cannot handle encodings
correct (and there are many around) and ignorant users who are using
UTF-8 in environments that are not specified for multibyte charsets
(IRC) is the most annoying one.

As there are places where UTF-8 makes perfect sense and is the best
solution it is not the best solution for all ignorance users (me too ;-)
have.

So specifying to be UTF-8 capable is somewhat inconsequent. Software has
to be capable to handle every encoding as long as they are specified for
that encodings.

Regards
Klaus
- --
Klaus Ethgen http://www.ethgen.ch/
pub 2048R/D1A4EDE5 2000-02-26 Klaus Ethgen <Klaus@Ethgen.de>
Fingerprint: D7 67 71 C4 99 A6 D4 FE EA 40 30 57 3C 88 26 2B
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEVAwUBTVUksZ+OKpjRpO3lAQoxGgf/WRdHVqOQ+4A/VkbaLRkXk7uZMKk1uNMT
t5gIbmtkIZLRhGkVZIzuVNXT7Zlq+tS3HwpbUaHNmd7ImNUlN+ m9dP1gJFacZaGd
zYeM0L1G9nfh4iwNmNIqQ/ZhF3lnOUtV6kDqvlZ4EgIwXfAPDZeFMgCxkCeh8mbq
H2MABIqwGxahqQoZ6Oql0npvE4QMVB7Use2iT2pPiNBSsB1hFz H9sqNu+uNdbko9
mI82BLHhMwwjhIo3ceFEHkah5pCPlJpTJHgRLd5nYf6/BUkEiR+ECnohdbkjjX5d
1ftp+4Q7Bngve1+5vM4yKQJAEx5vV1kV8U+GaQGE8Kad+op2Bh WL+Q==
=VYai
-----END PGP SIGNATURE-----


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211115946.GA4031@ikki.ethgen.ch">http://lists.debian.org/20110211115946.GA4031@ikki.ethgen.ch
 
Old 02-11-2011, 11:20 AM
Torsten Werner
 
Default Make Unicode bugs release critical?

Am -10.01.-28163 20:59, schrieb Andrey Rahmatullin:
> On Fri, Feb 11, 2011 at 11:14:42AM +0100, Miroslav Kure wrote:
>>> However, I'm curious: is there a lot of software that is broken with
>>> Unicode, particularly with the UTF-8 encoding? I can't remember anything
>>> much in recent times.
>> Mostly it is just the old stuff like
>> - eterm, aterm
>> - elvis
>> - X tools from the basic package (xman, xmessage, xmore, ...)
>> - TeX without additional packages
> - tr(1)

grep, sed, awk, bash, ...

Torsten


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D552988.1020001@debian.org">http://lists.debian.org/4D552988.1020001@debian.org
 
Old 02-11-2011, 11:39 AM
Andrey Rahmatullin
 
Default Make Unicode bugs release critical?

On Fri, Feb 11, 2011 at 01:20:24PM +0100, Torsten Werner wrote:
> >>> However, I'm curious: is there a lot of software that is broken with
> >>> Unicode, particularly with the UTF-8 encoding? I can't remember anything
> >>> much in recent times.
> >> Mostly it is just the old stuff like
> >> - eterm, aterm
> >> - elvis
> >> - X tools from the basic package (xman, xmessage, xmore, ...)
> >> - TeX without additional packages
> > - tr(1)
> grep, sed, awk, bash, ...
http://bugs.debian.org/495677

--
WBR, wRAR
 
Old 02-11-2011, 11:46 AM
Norbert Preining
 
Default Make Unicode bugs release critical?

On Fr, 11 Feb 2011, Roger Leigh wrote:
> XeTeX and XeLaTeX allow native UTF-8 input. Should be made the
> default, IMO, given how obsolete and broken the "standard" TeX
> encodings are. Being able to write in actual text rather than

Please don't write rubbish if you don't know what you are talking about!!!

You have apparently no idea between input and font encoding.

LaTeX can easily useutf8 with the appropriate inputenc, as well
as dozens of other encoding. Not all of the world is using UTF8.
UTF( is still taileored to western roman script, thus very unpopular
in Japan for example.

> sorts out the awful font support, so you can use standard
> freetype-registered fonts, again without the pain. Result: a
> document you can actually read in the editor!

Argg, PLEASE STOP THAT RUBBISH!!!!

I never use xetex, I write a lot in German (umlauts), Japanese,
Italian, ...

TeX is different, don't try to throw away working solutions of 20 years
because of your ignorance.

ARrggggggg. I love people blabbering like drunkyards.

> IMO all those broken terminal emulators, editors and tools should
> be put in the bin. There are plenty of non-broken replacements, so
> why keep them around to bitrot even further? It's not like it's

So what is the replacement for tex?
Yeah iknow, it is *luatex* but we are FAAAAAR fro being stable and
usable.

XeTeX is nice for certain things, but not for all. Have you tried to
set Tibetan text with XeTeX? The last time I tried it was a mess.
And with Khmer (the language and script of Cambodia) it is even worse.
Only because you are only using ASCII characters please don't make the
rest of the world laugh on you.

Best wishes

Norbert
(mumbling "throw away in the bin*, *standard freetype*, ...)

------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
HAGNABY (n.)
Someone who looked a lot more attractive in the disco than they do in
your bed the next morning.
--- Douglas Adams, The Meaning of Liff


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211124629.GA1816@gamma.logic.tuwien.ac.at">ht tp://lists.debian.org/20110211124629.GA1816@gamma.logic.tuwien.ac.at
 
Old 02-11-2011, 11:50 AM
Lars Wirzenius
 
Default Make Unicode bugs release critical?

On pe, 2011-02-11 at 13:20 +0100, Torsten Werner wrote:
> grep, sed, awk, bash, ...

grep, sed, and awk, at least, seem to work acceptably for me with UTF-8.
The support can be improved, I'm sure.

--
Blog/wiki/website hosting with ikiwiki (free for free software):
http://www.branchable.com/


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1297428633.3105.56.camel@havelock.lan">http://lists.debian.org/1297428633.3105.56.camel@havelock.lan
 
Old 02-11-2011, 12:02 PM
Faidon Liambotis
 
Default Make Unicode bugs release critical?

On 02/11/11 14:20, Torsten Werner wrote:


grep, sed, awk, bash, ...


?

$ echo αβγ | sed 's/./a/'
aβγ

Regards,
Φαίδων :-)


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D553373.5060409@debian.org">http://lists.debian.org/4D553373.5060409@debian.org
 
Old 02-11-2011, 12:18 PM
Vincent Lefevre
 
Default Make Unicode bugs release critical?

On 2011-02-11 21:46:29 +0900, Norbert Preining wrote:
> On Fr, 11 Feb 2011, Roger Leigh wrote:
> > XeTeX and XeLaTeX allow native UTF-8 input. Should be made the
> > default, IMO, given how obsolete and broken the "standard" TeX
> > encodings are. Being able to write in actual text rather than
>
> Please don't write rubbish if you don't know what you are talking about!!!
>
> You have apparently no idea between input and font encoding.
>
> LaTeX can easily useutf8 with the appropriate inputenc,

Which one???

FYI, utf8 is very incomplete and utf8x is broken (bug 601365).

--
Vincent Lefvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arnaire project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211131843.GH15920@prunille.vinc17.org">http://lists.debian.org/20110211131843.GH15920@prunille.vinc17.org
 
Old 02-11-2011, 12:28 PM
Roger Leigh
 
Default Make Unicode bugs release critical?

On Fri, Feb 11, 2011 at 09:46:29PM +0900, Norbert Preining wrote:
> On Fr, 11 Feb 2011, Roger Leigh wrote:
> > XeTeX and XeLaTeX allow native UTF-8 input. Should be made the
> > default, IMO, given how obsolete and broken the "standard" TeX
> > encodings are. Being able to write in actual text rather than
>
> Please don't write rubbish if you don't know what you are talking about!!!

Um, no need to be rude. Please keep your reply to technical points;
if I've said something incorrect, by all means correct me, but
insults is a step too far. I haven't said anything that could justify
it, other than the fact that you disagree with my /opinion/.

> You have apparently no idea between input and font encoding.

I only mentioned UTF-8 with regard to input, so you are assuming
too much.

> LaTeX can easily useutf8 with the appropriate inputenc, as well
> as dozens of other encoding. Not all of the world is using UTF8.
> UTF( is still taileored to western roman script, thus very unpopular
> in Japan for example.

The inputenc hack only gets you so far. I tried to go this way, and
ran into all sorts of issues with UTF-8 in macro definitions getting
scrambled and other sources of pain. With XeLaTeX I had no such
troubles. So IME inputenc was not a suitable solution for serious
UTF-8 work.

> > sorts out the awful font support, so you can use standard
> > freetype-registered fonts, again without the pain. Result: a
> > document you can actually read in the editor!
>
> Argg, PLEASE STOP THAT RUBBISH!!!!

What you are calling "rubbish" is not in any way false. It's given
me the ability to have nice legible UTF-8-encoded documents, with
excellent font support. There may be other ways. There may be
better ways. But it's not wrong.

[snip rant]

> > IMO all those broken terminal emulators, editors and tools should
> > be put in the bin. There are plenty of non-broken replacements, so
> > why keep them around to bitrot even further? It's not like it's
>
> So what is the replacement for tex?
> Yeah iknow, it is *luatex* but we are FAAAAAR fro being stable and
> usable.

Well I thought the jury was still out on which was the better solution.
I really couldn't care less which "wins"; I'm using the solution which
works right now, and I'll happily adopt whatever is better down the
line.

> XeTeX is nice for certain things, but not for all. Have you tried to
> set Tibetan text with XeTeX? The last time I tried it was a mess.
> And with Khmer (the language and script of Cambodia) it is even worse.
> Only because you are only using ASCII characters please don't make the
> rest of the world laugh on you.

You are again making unwarranted assumptions. I might not be using it
for difficult-to-set languages, but I'm certainly not using ASCII
characters only, or I wouldn't be needing UTF-8 input.


Regards,
Roger

--
.'`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
 
Old 02-11-2011, 12:30 PM
Vincent Lefevre
 
Default Make Unicode bugs release critical?

On 2011-02-11 15:33:49 +0500, Andrey Rahmatullin wrote:
> On Fri, Feb 11, 2011 at 11:14:42AM +0100, Miroslav Kure wrote:
> > > However, I'm curious: is there a lot of software that is broken with
> > > Unicode, particularly with the UTF-8 encoding? I can't remember anything
> > > much in recent times.
> > Mostly it is just the old stuff like
> > - eterm, aterm
> > - elvis
> > - X tools from the basic package (xman, xmessage, xmore, ...)
> > - TeX without additional packages
> - tr(1)

"less" has problems with new Unicode characters (bug 597918).

--
Vincent Lefvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arnaire project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211133024.GI15920@prunille.vinc17.org">http://lists.debian.org/20110211133024.GI15920@prunille.vinc17.org
 
Old 02-11-2011, 12:36 PM
Adam Borowski
 
Default Make Unicode bugs release critical?

On Fri, Feb 11, 2011 at 12:59:46PM +0100, Klaus Ethgen wrote:
> Am Fr den 11. Feb 2011 um 10:37 schrieb Lars Wirzenius:
> > The first Unicode standard was published in 1991. That's twenty years
> > ago. Any software that processes text at all and is incapable of dealing
> > with UTF-8 should be considered with extreme suspicion. Making all such
> > bugs be release critical (which includes the notion that release
> > managers may ignore the bug in particular cases) sounds like a good way
> > to get things under control.
>
> I think you are mixing stuff together. First there is unicode. There are
> several definitions for unicode (unicode-16, unicode-32, ...) but UTF-8
> is not unicode it is just one implementation of unicode and in my eyes
> the most problematic as it has undefined states and is variable length.

There is just one definition of Unicode, any new versions merely add extra
characters, collating rules, etc.

There are several ways to represent Unicode as a stream of bytes. Only one
of them is fit for external storage, and that's UTF-8 since it doesn't break
the assumptions that are true for text files:
1. no null bytes
2. basic newlines, etc are always newlines, never a part of a bigger
character (not true for some ancient multibyte encodings)
3. not affected by endianness or any other internal detail

Also, _all_ Unicode encodings are of variable length.

> However, UTF-8 was created to allow using unicode in non-unicode
> environments. For me that was always a pointless plan and the unreadable
> UTF-8 characters all around buggy software that cannot handle encodings
> correct (and there are many around) and ignorant users who are using
> UTF-8 in environments that are not specified for multibyte charsets
> (IRC) is the most annoying one.

UTF-8 was never meant as merely a tool to "allow using unicode in
non-unicode environments".

UTF-32 is useful only as an internal representation if you do care about a
string of code points. Since a single character can consist of multiple
such code points, it doesn't give you much unless you have to pass every
code point through a function like wcwidth() -- ie, you are implementing
something low-level which cares about properties of characters and their
parts. You should never place UTF-32 into external storage that is not
private to your program or can possibly be moved.

UTF-16 is never, ever useful. It is a sad trap for win32 and Java
developers, due to a bad engineering decision suggested, as I was told, by
delegates from Microsoft and Sun, who wanted to "conserve disk space and
memory" by storing separately code points and a language tag -- ie, exactly
the thing Unicode was supposed to get us rid of. Even on day one, it was
known that you can't fit all characters into 16 bits, and the decision to
put all "rare characters" into a "private" area that needs out of band
information was pretty ridiculous. The end result is, you have an encoding
with all downsides of UTF-8 but none of the advantages.

Since neither UTF-16 nor UTF-32 can be considered text, the decision all
UNIX systems made was to use UTF-8 in the libc's API in all Unicode locales.
Otherwise, you'd need separate APIs like FooBarA()/FooBarW() on Windows,
which cause no end of problems.

> So specifying to be UTF-8 capable is somewhat inconsequent. Software has
> to be capable to handle every encoding as long as they are specified for
> that encodings.

No, there is only one encoding left, as long as you don't have to talk to
Windows. We can start purging away all the support for ancient charsets in
places that do not need to handle foreign data. Debian has used UTF-8 as
default for 5 releases already, and if you try to use an ancient locale, do
not expect good results since no one bothers fixing bugs there. And
maintaining unused code costs time and causes a risk of bugs, so good
riddance!

--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211133612.GA2053@angband.pl">http://lists.debian.org/20110211133612.GA2053@angband.pl
 

Thread Tools




All times are GMT. The time now is 08:27 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org