FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Development

 
 
LinkBack Thread Tools
 
Old 02-11-2011, 12:39 PM
Torsten Werner
 
Default Make Unicode bugs release critical?

Am 11.02.2011 14:02, schrieb Faidon Liambotis:
> $ echo αβγ | sed 's/./a/'
> aβγ

Okay. But...

$ echo αβγ | busybox sed 's/./a/'
a�βγ




--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D553BF6.9020604@debian.org">http://lists.debian.org/4D553BF6.9020604@debian.org
 
Old 02-11-2011, 12:43 PM
Norbert Preining
 
Default Make Unicode bugs release critical?

On Fr, 11 Feb 2011, Roger Leigh wrote:
> Um, no need to be rude.

Well, you started with "throw TeX into the bin!" (cum grano salis)
The only possible answer to that is mine. Or shutting up and ignoring
that kind of rants from your side.

> insults is a step too far. I haven't said anything that could justify
> it, other than the fact that you disagree with my /opinion/.

Very simple: replaceing *tex* wiht *xetex* will break existing
documents. And that is a no-go. That is TeX world.
You are taling about WinWord world.

> > You have apparently no idea between input and font encoding.
>
> I only mentioned UTF-8 with regard to input, so you are assuming
> too much.

You mentioned *fontconfig* which is font encoding, and has nothing
whatsoever to do with inputenc. I don't assume too much.

> The inputenc hack only gets you so far. I tried to go this way, and

Agreed. Improvements are welcome, please help and fix the
shortcomings.

> > > sorts out the awful font support, so you can use standard
> > > freetype-registered fonts, again without the pain. Result: a
> > > document you can actually read in the editor!
> >
> > Argg, PLEASE STOP THAT RUBBISH!!!!
>
> What you are calling "rubbish" is not in any way false. It's given

It *IS* wrong.
You are stating that "using freetype-registered fonts makes a document
readable by the editor". Sorry this is rediculous.
- different fonts might register themselves under different names
to fontconfig
- fonts might not be available her or there and migh tnot be embedded
in the pdf

DEK wrote his own font loading mechanism because he wanted to be sure
that docuemtns *can* be typeset also on any other machine, and that
works.
If you use xetex that might work, or might not work, or might work
but you are missing suddently some characters
(there is for example a version of the palatino fonts with cyrillic
characters, and a version without cyrillic characters, some systems
have these *enriched* fonts and don't embedd them properly. THen,
suddenly, on the target system, characters disappear. Is THIS
the way you want to typeset documetns?)

I repeat: RUBBISH.

> Well I thought the jury was still out on which was the better solution.

Most people I know in the TeX community are seeing the real future with
luatex.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
MARYTAVY (n.)
A person to whom, under dire injunctions of silence, you tell a secret
which you wish to be fare more widely known.
--- Douglas Adams, The Meaning of Liff


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211134338.GH1816@gamma.logic.tuwien.ac.at">ht tp://lists.debian.org/20110211134338.GH1816@gamma.logic.tuwien.ac.at
 
Old 02-11-2011, 01:02 PM
Adam Borowski
 
Default Make Unicode bugs release critical?

On Fri, Feb 11, 2011 at 02:30:24PM +0100, Vincent Lefevre wrote:
> On 2011-02-11 15:33:49 +0500, Andrey Rahmatullin wrote:
> > On Fri, Feb 11, 2011 at 11:14:42AM +0100, Miroslav Kure wrote:
> > > > However, I'm curious: is there a lot of software that is broken with
> > > > Unicode, particularly with the UTF-8 encoding? I can't remember anything
> > > > much in recent times.
>
> "less" has problems with new Unicode characters (bug 597918).

Unicode 6.0 came out in october 2010, well after Squeeze's freeze, so you
can't expect support for new characters already. There are in no fonts
shipped with squeeze, so not recognizing the characters as valid is not a
big problem.

Less shouldn't maintain a private copy of character properties if all that
data is already present in libc -- but guess what, wcwidth(0x1F4A9) and
iswprint() don't know them too.

So oh well, Squeeze won't display such vital characters as 🐈 "kitten"[1],
👻 "ghost", 👹 "japanese ogre" or 💩 "pile of shit". Gotta invest in a
crystal ball that will tell us what new characters will be.


[1]. To see my examples, you can grab:
http://angband.pl/debian/pool/main/t/ttf-ancient-fonts/ttf-ancient-fonts_2.52-1.0kb1_all.deb

(newer than the version in unstable, Gürkan Sengün's version is
404-compliant, let's poke him so we have _one_ Unicode 6.0 font in Debian).

--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211140202.GB2053@angband.pl">http://lists.debian.org/20110211140202.GB2053@angband.pl
 
Old 02-11-2011, 01:13 PM
Roger Leigh
 
Default Make Unicode bugs release critical?

On Fri, Feb 11, 2011 at 10:43:38PM +0900, Norbert Preining wrote:
> On Fr, 11 Feb 2011, Roger Leigh wrote:
> > Um, no need to be rude.
>
> Well, you started with "throw TeX into the bin!" (cum grano salis)
> The only possible answer to that is mine. Or shutting up and ignoring
> that kind of rants from your side.

Please read what I said carefully, rather than imagined slights.
I did not at any point state that TeX should be thrown in the bin;
that was with regard to "broken terminal emulators, editors and
tools". I fully believe we should remove obsolete tools which have
superior replacements. I did not include TeX in that category.

> > > You have apparently no idea between input and font encoding.
> >
> > I only mentioned UTF-8 with regard to input, so you are assuming
> > too much.
>
> You mentioned *fontconfig* which is font encoding, and has nothing
> whatsoever to do with inputenc. I don't assume too much.

No, I mentioned fontconfig because XeTeX allows use of system fonts
via fontconfig. That was completely separate from UTF-8 input.

> > > > sorts out the awful font support, so you can use standard
> > > > freetype-registered fonts, again without the pain. Result: a
> > > > document you can actually read in the editor!
> > >
> > > Argg, PLEASE STOP THAT RUBBISH!!!!
> >
> > What you are calling "rubbish" is not in any way false. It's given
>
> It *IS* wrong.
> You are stating that "using freetype-registered fonts makes a document
> readable by the editor". Sorry this is rediculous.
> - different fonts might register themselves under different names
> to fontconfig
> - fonts might not be available her or there and migh tnot be embedded
> in the pdf

[...]

> I repeat: RUBBISH.

I didn't state any of those things. Please calm down, and please
read what I actually wrote, rather than what you thought I wrote.


Regards,
Roger

--
.'`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
 
Old 02-11-2011, 01:17 PM
Norbert Preining
 
Default Make Unicode bugs release critical?

On Fr, 11 Feb 2011, Roger Leigh wrote:
> read what I actually wrote, rather than what you thought I wrote.

So *what* is your proposal, instead of discussing uselessly and wasting
bytes?

Is it:
ln -sf tex xetex

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
LITTLE URSWICK (n.)
The member of any class who most inclines a teacher towards the view
that capital punishment should be introduced in schools.
--- Douglas Adams, The Meaning of Liff


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211141749.GL1816@gamma.logic.tuwien.ac.at">ht tp://lists.debian.org/20110211141749.GL1816@gamma.logic.tuwien.ac.at
 
Old 02-11-2011, 01:35 PM
Vincent Lefevre
 
Default Make Unicode bugs release critical?

On 2011-02-11 15:02:02 +0100, Adam Borowski wrote:
> On Fri, Feb 11, 2011 at 02:30:24PM +0100, Vincent Lefevre wrote:
> > On 2011-02-11 15:33:49 +0500, Andrey Rahmatullin wrote:
> > > On Fri, Feb 11, 2011 at 11:14:42AM +0100, Miroslav Kure wrote:
> > > > > However, I'm curious: is there a lot of software that is broken with
> > > > > Unicode, particularly with the UTF-8 encoding? I can't remember anything
> > > > > much in recent times.
> >
> > "less" has problems with new Unicode characters (bug 597918).
>
> Unicode 6.0 came out in october 2010,

The character mentioned in my bug report (U+1E9F LATIN SMALL LETTER DELTA)
appeared in Unicode 5.1.0 (March 2008).

> well after Squeeze's freeze, so you can't expect support for new
> characters already.

Well, March 2008 was more than 1 year before Squeeze's freeze.

> There are in no fonts shipped with squeeze, so not recognizing the
> characters as valid is not a big problem.

Fonts containing the character in question are shipped with Squeeze:
the character appears correctly in xterm.

> Less shouldn't maintain a private copy of character properties if
> all that data is already present in libc

I agree.

> -- but guess what, wcwidth(0x1F4A9) and iswprint() don't know them
> too.

No problems with U+1E9F:

Property alnum : yes
Property alpha : yes
Property cntrl : no
Property digit : no
Property graph : yes
Property lower : yes
Property print : yes
Property punct : no
Property space : no
Property upper : no
Property xdigit: no
wcwidth = 1

So, if "less" were using libc, it wouldn't have any problem with
this character.

--
Vincent Lefvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arnaire project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211143511.GJ15920@prunille.vinc17.org">http://lists.debian.org/20110211143511.GJ15920@prunille.vinc17.org
 
Old 02-11-2011, 02:39 PM
Joey Hess
 
Default Make Unicode bugs release critical?

Lars Wirzenius wrote:
> However, I'm curious: is there a lot of software that is broken with
> Unicode, particularly with the UTF-8 encoding? I can't remember anything
> much in recent times.

We chose an 80% quickfix to get where we are, and so now we have the
other 80% to go. It's been whittled away at for the past 10 years or so,
but still a lot left.

And, that's utf8 support, only. It's probably a pipe dream to expect
other unicode encodings to work half as well, and surely other encodings
fare even worse overall. If anything, utf8 probably makes the overall
situation worse for other encodings, since we expect it to "just work",
and give up on handling the other complexity.

> The first Unicode standard was published in 1991. That's twenty years
> ago. Any software that processes text at all and is incapable of dealing
> with UTF-8 should be considered with extreme suspicion.

Most languages still make it easy to get wrong, in my experience.

It can be as simple as software written trusting language documentation
that says "strings are processed in unicode" and doesn't point out all
the exceptions that can let non-unicode data in. For example, this
simple haskell program processess a file's content utf-8 cleanly, but
prints its name like "foö".

import System.Environment
main = do
args <- getArgs
let file = head args
putStrLn $ "file is: " ++ file
putStr =<< readFile file

This program has an entirely different failure mode; type in
"foö" (touch it first), and it will complain that "fo�" doesn't exist.

main = getLine >>= readFile >>= putStr

Neither of these failure modes is obvious from any documentation I've seen.
Both of these programs are something a typical developer would expect to
work. (Both also have unexpected failure modes when LANG=C.)

Probably every thousand lines of perl has a unicode encoding bug of some
sort. Based on data from my own code. Any perl code that uses an XS module
probably has an encoding bug.

I assume that python had some problems with its unicode support too,
since they saw fit to radically change it in python 3. And it sounds
like the python 3 changes will break unicode in many programs ported
over to it, unless file opens etc are audited and fixed. Stackoverflow
has 1600 matches for python unicode questions.

The best case is probably a language that has a restructed enough
interface that most of these problems are avoided.
(But, stackoverflow still has 500 javascript unicode questions.)

> Making all such
> bugs be release critical (which includes the notion that release
> managers may ignore the bug in particular cases) sounds like a good way
> to get things under control.

It would probably be a large load on the RMs. It's easy to pick some
random program that works great with unicode and find an edge case. The RMs
would probably prefer to not have git getting RC bugs filed just because
it sometimes exposes filenames written like "fo303266".

--
see shy jo, who deals with at least 1 unicode bug a week on average. 4 this week
 
Old 02-11-2011, 03:15 PM
Marco Túlio Gontijo e Silva
 
Default Make Unicode bugs release critical?

Excerpts from Joey Hess's message of Sex Fev 11 13:39:08 -0200 2011:
(...)
> It can be as simple as software written trusting language documentation
> that says "strings are processed in unicode" and doesn't point out all
> the exceptions that can let non-unicode data in. For example, this
> simple haskell program processess a file's content utf-8 cleanly, but
> prints its name like "foö".
>
> import System.Environment
> main = do
> args <- getArgs
> let file = head args
> putStrLn $ "file is: " ++ file
> putStr =<< readFile file
>
> This program has an entirely different failure mode; type in
> "foö" (touch it first), and it will complain that "fo�" doesn't exist.
>
> main = getLine >>= readFile >>= putStr
>
> Neither of these failure modes is obvious from any documentation I've seen.
> Both of these programs are something a typical developer would expect to
> work. (Both also have unexpected failure modes when LANG=C.)

http://hackage.haskell.org/trac/ghc/ticket/3307

Greetings.
(...)
 
Old 02-11-2011, 07:32 PM
Kurt Roeckx
 
Default Make Unicode bugs release critical?

On Fri, Feb 11, 2011 at 09:37:54AM +0000, Lars Wirzenius wrote:
>
> However, I'm curious: is there a lot of software that is broken with
> Unicode, particularly with the UTF-8 encoding? I can't remember anything
> much in recent times.

ispell, aspell. I think hunspell got fix recently.


Kurt


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211203240.GA30246@roeckx.be">http://lists.debian.org/20110211203240.GA30246@roeckx.be
 
Old 02-11-2011, 09:16 PM
Henrique de Moraes Holschuh
 
Default Make Unicode bugs release critical?

On Fri, 11 Feb 2011, Lars Wirzenius wrote:
> However, I'm curious: is there a lot of software that is broken with
> Unicode, particularly with the UTF-8 encoding? I can't remember anything
> much in recent times.

1. Stuff that cannot do one of UTF-8, UTF-16 or UCS-4.

2. Anything that cannot deal with Supplementary planes.

This includes the use of UCS-2 instead of UTF-16, as it cannot represent
the Supplementary planes. python 3 when not compiled to use UCS-4 memory
hog mode is an example, I am told.

We likely want to restrain ourselves to declaring (1) to be release
critical for Wheezy.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110211221653.GB18551@khazad-dum.debian.net">http://lists.debian.org/20110211221653.GB18551@khazad-dum.debian.net
 

Thread Tools




All times are GMT. The time now is 06:59 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org