Make Unicode bugs release critical?
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512 Hi, Am Fr den 11. Feb 2011 um 10:37 schrieb Lars Wirzenius: > The first Unicode standard was published in 1991. That's twenty years > ago. Any software that processes text at all and is incapable of dealing > with UTF-8 should be considered with extreme suspicion. Making all such > bugs be release critical (which includes the notion that release > managers may ignore the bug in particular cases) sounds like a good way > to get things under control. I think you are mixing stuff together. First there is unicode. There are several definitions for unicode (unicode-16, unicode-32, ...) but UTF-8 is not unicode it is just one implementation of unicode and in my eyes the most problematic as it has undefined states and is variable length. However, UTF-8 was created to allow using unicode in non-unicode environments. For me that was always a pointless plan and the unreadable UTF-8 characters all around buggy software that cannot handle encodings correct (and there are many around) and ignorant users who are using UTF-8 in environments that are not specified for multibyte charsets (IRC) is the most annoying one. As there are places where UTF-8 makes perfect sense and is the best solution it is not the best solution for all ignorance users (me too ;-) have. So specifying to be UTF-8 capable is somewhat inconsequent. Software has to be capable to handle every encoding as long as they are specified for that encodings. Regards Klaus - -- Klaus Ethgen http://www.ethgen.ch/ pub 2048R/D1A4EDE5 2000-02-26 Klaus Ethgen <Klaus@Ethgen.de> Fingerprint: D7 67 71 C4 99 A6 D4 FE EA 40 30 57 3C 88 26 2B -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQEVAwUBTVUksZ+OKpjRpO3lAQoxGgf/WRdHVqOQ+4A/VkbaLRkXk7uZMKk1uNMT t5gIbmtkIZLRhGkVZIzuVNXT7Zlq+tS3HwpbUaHNmd7ImNUlN+ m9dP1gJFacZaGd zYeM0L1G9nfh4iwNmNIqQ/ZhF3lnOUtV6kDqvlZ4EgIwXfAPDZeFMgCxkCeh8mbq H2MABIqwGxahqQoZ6Oql0npvE4QMVB7Use2iT2pPiNBSsB1hFz H9sqNu+uNdbko9 mI82BLHhMwwjhIo3ceFEHkah5pCPlJpTJHgRLd5nYf6/BUkEiR+ECnohdbkjjX5d 1ftp+4Q7Bngve1+5vM4yKQJAEx5vV1kV8U+GaQGE8Kad+op2Bh WL+Q== =VYai -----END PGP SIGNATURE----- -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 20110211115946.GA4031@ikki.ethgen.ch">http://lists.debian.org/20110211115946.GA4031@ikki.ethgen.ch |
Make Unicode bugs release critical?
Am -10.01.-28163 20:59, schrieb Andrey Rahmatullin:
> On Fri, Feb 11, 2011 at 11:14:42AM +0100, Miroslav Kure wrote: >>> However, I'm curious: is there a lot of software that is broken with >>> Unicode, particularly with the UTF-8 encoding? I can't remember anything >>> much in recent times. >> Mostly it is just the old stuff like >> - eterm, aterm >> - elvis >> - X tools from the basic package (xman, xmessage, xmore, ...) >> - TeX without additional packages > - tr(1) grep, sed, awk, bash, ... Torsten -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 4D552988.1020001@debian.org">http://lists.debian.org/4D552988.1020001@debian.org |
Make Unicode bugs release critical?
On Fri, Feb 11, 2011 at 01:20:24PM +0100, Torsten Werner wrote:
> >>> However, I'm curious: is there a lot of software that is broken with > >>> Unicode, particularly with the UTF-8 encoding? I can't remember anything > >>> much in recent times. > >> Mostly it is just the old stuff like > >> - eterm, aterm > >> - elvis > >> - X tools from the basic package (xman, xmessage, xmore, ...) > >> - TeX without additional packages > > - tr(1) > grep, sed, awk, bash, ... http://bugs.debian.org/495677 -- WBR, wRAR |
Make Unicode bugs release critical?
On Fr, 11 Feb 2011, Roger Leigh wrote:
> XeTeX and XeLaTeX allow native UTF-8 input. Should be made the > default, IMO, given how obsolete and broken the "standard" TeX > encodings are. Being able to write in actual text rather than Please don't write rubbish if you don't know what you are talking about!!! You have apparently no idea between input and font encoding. LaTeX can easily useutf8 with the appropriate inputenc, as well as dozens of other encoding. Not all of the world is using UTF8. UTF( is still taileored to western roman script, thus very unpopular in Japan for example. > sorts out the awful font support, so you can use standard > freetype-registered fonts, again without the pain. Result: a > document you can actually read in the editor! Argg, PLEASE STOP THAT RUBBISH!!!! I never use xetex, I write a lot in German (umlauts), Japanese, Italian, ... TeX is different, don't try to throw away working solutions of 20 years because of your ignorance. ARrggggggg. I love people blabbering like drunkyards. > IMO all those broken terminal emulators, editors and tools should > be put in the bin. There are plenty of non-broken replacements, so > why keep them around to bitrot even further? It's not like it's So what is the replacement for tex? Yeah iknow, it is *luatex* but we are FAAAAAR fro being stable and usable. XeTeX is nice for certain things, but not for all. Have you tried to set Tibetan text with XeTeX? The last time I tried it was a mess. And with Khmer (the language and script of Cambodia) it is even worse. Only because you are only using ASCII characters please don't make the rest of the world laugh on you. Best wishes Norbert (mumbling "throw away in the bin*, *standard freetype*, ...) ------------------------------------------------------------------------ Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org} JAIST, Japan TeX Live & Debian Developer DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094 ------------------------------------------------------------------------ HAGNABY (n.) Someone who looked a lot more attractive in the disco than they do in your bed the next morning. --- Douglas Adams, The Meaning of Liff -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 20110211124629.GA1816@gamma.logic.tuwien.ac.at">ht tp://lists.debian.org/20110211124629.GA1816@gamma.logic.tuwien.ac.at |
Make Unicode bugs release critical?
On pe, 2011-02-11 at 13:20 +0100, Torsten Werner wrote:
> grep, sed, awk, bash, ... grep, sed, and awk, at least, seem to work acceptably for me with UTF-8. The support can be improved, I'm sure. -- Blog/wiki/website hosting with ikiwiki (free for free software): http://www.branchable.com/ -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 1297428633.3105.56.camel@havelock.lan">http://lists.debian.org/1297428633.3105.56.camel@havelock.lan |
Make Unicode bugs release critical?
On 02/11/11 14:20, Torsten Werner wrote:
grep, sed, awk, bash, ... ? $ echo αβγ | sed 's/./a/' aβγ Regards, Φαίδων :-) -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 4D553373.5060409@debian.org">http://lists.debian.org/4D553373.5060409@debian.org |
Make Unicode bugs release critical?
On 2011-02-11 21:46:29 +0900, Norbert Preining wrote:
> On Fr, 11 Feb 2011, Roger Leigh wrote: > > XeTeX and XeLaTeX allow native UTF-8 input. Should be made the > > default, IMO, given how obsolete and broken the "standard" TeX > > encodings are. Being able to write in actual text rather than > > Please don't write rubbish if you don't know what you are talking about!!! > > You have apparently no idea between input and font encoding. > > LaTeX can easily useutf8 with the appropriate inputenc, Which one??? FYI, utf8 is very incomplete and utf8x is broken (bug 601365). -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 20110211131843.GH15920@prunille.vinc17.org">http://lists.debian.org/20110211131843.GH15920@prunille.vinc17.org |
Make Unicode bugs release critical?
On Fri, Feb 11, 2011 at 09:46:29PM +0900, Norbert Preining wrote:
> On Fr, 11 Feb 2011, Roger Leigh wrote: > > XeTeX and XeLaTeX allow native UTF-8 input. Should be made the > > default, IMO, given how obsolete and broken the "standard" TeX > > encodings are. Being able to write in actual text rather than > > Please don't write rubbish if you don't know what you are talking about!!! Um, no need to be rude. Please keep your reply to technical points; if I've said something incorrect, by all means correct me, but insults is a step too far. I haven't said anything that could justify it, other than the fact that you disagree with my /opinion/. > You have apparently no idea between input and font encoding. I only mentioned UTF-8 with regard to input, so you are assuming too much. > LaTeX can easily useutf8 with the appropriate inputenc, as well > as dozens of other encoding. Not all of the world is using UTF8. > UTF( is still taileored to western roman script, thus very unpopular > in Japan for example. The inputenc hack only gets you so far. I tried to go this way, and ran into all sorts of issues with UTF-8 in macro definitions getting scrambled and other sources of pain. With XeLaTeX I had no such troubles. So IME inputenc was not a suitable solution for serious UTF-8 work. > > sorts out the awful font support, so you can use standard > > freetype-registered fonts, again without the pain. Result: a > > document you can actually read in the editor! > > Argg, PLEASE STOP THAT RUBBISH!!!! What you are calling "rubbish" is not in any way false. It's given me the ability to have nice legible UTF-8-encoded documents, with excellent font support. There may be other ways. There may be better ways. But it's not wrong. [snip rant] > > IMO all those broken terminal emulators, editors and tools should > > be put in the bin. There are plenty of non-broken replacements, so > > why keep them around to bitrot even further? It's not like it's > > So what is the replacement for tex? > Yeah iknow, it is *luatex* but we are FAAAAAR fro being stable and > usable. Well I thought the jury was still out on which was the better solution. I really couldn't care less which "wins"; I'm using the solution which works right now, and I'll happily adopt whatever is better down the line. > XeTeX is nice for certain things, but not for all. Have you tried to > set Tibetan text with XeTeX? The last time I tried it was a mess. > And with Khmer (the language and script of Cambodia) it is even worse. > Only because you are only using ASCII characters please don't make the > rest of the world laugh on you. You are again making unwarranted assumptions. I might not be using it for difficult-to-set languages, but I'm certainly not using ASCII characters only, or I wouldn't be needing UTF-8 input. Regards, Roger -- .'`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. |
Make Unicode bugs release critical?
On 2011-02-11 15:33:49 +0500, Andrey Rahmatullin wrote:
> On Fri, Feb 11, 2011 at 11:14:42AM +0100, Miroslav Kure wrote: > > > However, I'm curious: is there a lot of software that is broken with > > > Unicode, particularly with the UTF-8 encoding? I can't remember anything > > > much in recent times. > > Mostly it is just the old stuff like > > - eterm, aterm > > - elvis > > - X tools from the basic package (xman, xmessage, xmore, ...) > > - TeX without additional packages > - tr(1) "less" has problems with new Unicode characters (bug 597918). -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 20110211133024.GI15920@prunille.vinc17.org">http://lists.debian.org/20110211133024.GI15920@prunille.vinc17.org |
Make Unicode bugs release critical?
On Fri, Feb 11, 2011 at 12:59:46PM +0100, Klaus Ethgen wrote:
> Am Fr den 11. Feb 2011 um 10:37 schrieb Lars Wirzenius: > > The first Unicode standard was published in 1991. That's twenty years > > ago. Any software that processes text at all and is incapable of dealing > > with UTF-8 should be considered with extreme suspicion. Making all such > > bugs be release critical (which includes the notion that release > > managers may ignore the bug in particular cases) sounds like a good way > > to get things under control. > > I think you are mixing stuff together. First there is unicode. There are > several definitions for unicode (unicode-16, unicode-32, ...) but UTF-8 > is not unicode it is just one implementation of unicode and in my eyes > the most problematic as it has undefined states and is variable length. There is just one definition of Unicode, any new versions merely add extra characters, collating rules, etc. There are several ways to represent Unicode as a stream of bytes. Only one of them is fit for external storage, and that's UTF-8 since it doesn't break the assumptions that are true for text files: 1. no null bytes 2. basic newlines, etc are always newlines, never a part of a bigger character (not true for some ancient multibyte encodings) 3. not affected by endianness or any other internal detail Also, _all_ Unicode encodings are of variable length. > However, UTF-8 was created to allow using unicode in non-unicode > environments. For me that was always a pointless plan and the unreadable > UTF-8 characters all around buggy software that cannot handle encodings > correct (and there are many around) and ignorant users who are using > UTF-8 in environments that are not specified for multibyte charsets > (IRC) is the most annoying one. UTF-8 was never meant as merely a tool to "allow using unicode in non-unicode environments". UTF-32 is useful only as an internal representation if you do care about a string of code points. Since a single character can consist of multiple such code points, it doesn't give you much unless you have to pass every code point through a function like wcwidth() -- ie, you are implementing something low-level which cares about properties of characters and their parts. You should never place UTF-32 into external storage that is not private to your program or can possibly be moved. UTF-16 is never, ever useful. It is a sad trap for win32 and Java developers, due to a bad engineering decision suggested, as I was told, by delegates from Microsoft and Sun, who wanted to "conserve disk space and memory" by storing separately code points and a language tag -- ie, exactly the thing Unicode was supposed to get us rid of. Even on day one, it was known that you can't fit all characters into 16 bits, and the decision to put all "rare characters" into a "private" area that needs out of band information was pretty ridiculous. The end result is, you have an encoding with all downsides of UTF-8 but none of the advantages. Since neither UTF-16 nor UTF-32 can be considered text, the decision all UNIX systems made was to use UTF-8 in the libc's API in all Unicode locales. Otherwise, you'd need separate APIs like FooBarA()/FooBarW() on Windows, which cause no end of problems. > So specifying to be UTF-8 capable is somewhat inconsequent. Software has > to be capable to handle every encoding as long as they are specified for > that encodings. No, there is only one encoding left, as long as you don't have to talk to Windows. We can start purging away all the support for ancient charsets in places that do not need to handle foreign data. Debian has used UTF-8 as default for 5 releases already, and if you try to use an ancient locale, do not expect good results since no one bothers fixing bugs there. And maintaining unused code costs time and causes a risk of bugs, so good riddance! -- 1KB // Microsoft corollary to Hanlon's razor: // Never attribute to stupidity what can be // adequately explained by malice. -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 20110211133612.GA2053@angband.pl">http://lists.debian.org/20110211133612.GA2053@angband.pl |
| All times are GMT. The time now is 07:49 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.