FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 02-15-2012, 10:58 AM
"Francesco R.(vivo)"
 
Default LANG=en_GB.UTF-8 by default

as subject says could gentoo change the policy and set an UTF-8 environment by
default?

http://www.gentoo.org/doc/en/utf-8.xml how to do it very well but having it
already set could have the following two advantages:

1) well utf-8 is everywhere, even the linux weekly newsletter has it in 2012
2) the user need to change, not to create a /etc/env.d/XX-lc, creating a
standard place where every gentoo install has this settings.

contra?

P.S. would be nice to have a wd_WD.UTF-8 with WD standing for world, just a
country is so 1900
 
Old 02-15-2012, 11:22 AM
"Mr. Aaron W. Swenson"
 
Default LANG=en_GB.UTF-8 by default

On Wed, Feb 15, 2012 at 12:58:52PM +0100, Francesco R.(vivo) wrote:
> as subject says could gentoo change the policy and set an UTF-8 environment by
> default?
>
> http://www.gentoo.org/doc/en/utf-8.xml how to do it very well but having it
> already set could have the following two advantages:
>
> 1) well utf-8 is everywhere, even the linux weekly newsletter has it in 2012
> 2) the user need to change, not to create a /etc/env.d/XX-lc, creating a
> standard place where every gentoo install has this settings.
>
> contra?
>
> P.S. would be nice to have a wd_WD.UTF-8 with WD standing for world, just a
> country is so 1900
>

wd_WD.UTF-8 is certainly a no go. WD doesn't match any ISO country
code. To support it, we'd have to create the necessary supporting
files and that would lead to a lot of work and headaches trying to
determine what should be where in what order, et cetera.

All of the files we create (ebuilds, initscripts) are UTF-8 in
accordance with GLEP 31. So, the issue would be with upstream projects
not using UTF-8 for their files.

However, the stage 3, last time I used it, didn't default to a UTF-8
environment, and it didn't default to using and/or including a capable
UTF-8 font. It is something I think we should look at changing.

--
Mr. Aaron W. Swenson
Gentoo Linux
Developer, Proxy Committer
Email : titanofold@gentoo.org
GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0
GnuPG ID : D1BBFDA0
 
Old 02-18-2012, 01:31 AM
Kerin Millar
 
Default LANG=en_GB.UTF-8 by default

On 15/02/2012 12:22, Mr. Aaron W. Swenson wrote:

On Wed, Feb 15, 2012 at 12:58:52PM +0100, Francesco R.(vivo) wrote:

as subject says could gentoo change the policy and set an UTF-8 environment by
default?


Perhaps it should define LANG="en_US.UTF-8" as a reasonable default,
which would be in line with other notable distros. Arch also used to
define LC_COLLATE="C" by default, probably to mitigate unpredictable
behaviour in some applications, but have since dropped this additional
variable so they must have deemed it no longer necessary.


I think that having a default configuration file would also raise
awareness of the importance of locale configuration and make it less
likely that users configure their systems inappropriately (defining
LC_ALL, for instance).



P.S. would be nice to have a wd_WD.UTF-8 with WD standing for world, just a
country is so 1900


Different countries/regions have different standards and conventions for
character classification, case conversion, date/numerical/currency
formatting etc. There's no basis on which to formally standardise a
world-wide definition.






However, the stage 3, last time I used it, didn't default to a UTF-8
environment, and it didn't default to using and/or including a capable
UTF-8 font. It is something I think we should look at changing.



Yet "unicode" is a default flag in the standard profiles. Most console
fonts have poor coverage. The best one I've found thus far is
"LatCyrGr-16" from fonty-rg, which provides good Latin and Cyrillic
coverage along with some Greek and esoteric punctuation characters.
Using this font, I've yet to find any developer's name that doesn't
render as expected while perusing the contents of the portage tree.


Being a 512 character font, one loses bold support unless using a
framebuffer console. Given that the default console fonts aren't
especially useful, it seems a small price to pay.


--Kerin
 
Old 02-19-2012, 12:00 AM
James Cloos
 
Default LANG=en_GB.UTF-8 by default

>>>>> "KM" == Kerin Millar <kerframil@gmail.com> writes:

KM> Arch also used to define LC_COLLATE="C" by default, probably to
KM> mitigate unpredictable behaviour in some applications, but have
KM> since dropped this additional variable so they must have deemed it
KM> no longer necessary.

Without LC_COLLATE="C" things like [a-z]* gets a false=positive match
on files like Makefile.

I recently noticed a bug on b.g.o where the ebuild has something like
doc/[A-Z]* expecting that it will not match doc/some_lowercase_subdir.

The bug, of course, is that glibc fraudulently defaults the latin, greek
and cyrillic locales to case-insensitive.

The real fix is to have root be C.UTF-8. Which differs from C only in
that the charset is utf-8.

-JimC
--
James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
 
Old 02-19-2012, 01:04 AM
Ben
 
Default LANG=en_GB.UTF-8 by default

On 19 February 2012 09:00, James Cloos <cloos@jhcloos.com> wrote:
> Without LC_COLLATE="C" things like [a-z]* gets a false=positive match
> on files like Makefile. [...]
>
> The real fix is to have root be C.UTF-8. *Which differs from C only in
> that the charset is utf-8.

In my opinion we should set a default environment with the following values:

LANG=en_US.UTF-8
LC_ALL=
LC_COLLATE=C

This offers the best default options to the majority of users, and is
easy to customize for those who wish to use another locale.

And yes, LC_ALL needs to be empty, because it would override the other
LC_* values.

This should be combined with some good unicode fonts, such as the
LatCyrGr-16 for console, and dejavu for X.

Cheers,
Ben
 
Old 02-19-2012, 10:39 AM
Amadeusz Żołnowski
 
Default LANG=en_GB.UTF-8 by default

Excerpts from Ben's message of 2012-02-19 03:04:19 +0100:
> On 19 February 2012 09:00, James Cloos <cloos@jhcloos.com> wrote:
> > Without LC_COLLATE="C" things like [a-z]* gets a false=positive
> > match on files like Makefile. [...]
> >
> > The real fix is to have root be C.UTF-8. *Which differs from C only
> > in that the charset is utf-8.
>
> In my opinion we should set a default environment with the following
> values:
>
> LANG=en_US.UTF-8
> LC_ALL=
> LC_COLLATE=C

This is only on my setups or this is "xy_XY.utf8" instead of
"xy_XY.UTF-8"?


--
Amadeusz Żołnowski
 
Old 02-19-2012, 02:14 PM
Ulrich Mueller
 
Default LANG=en_GB.UTF-8 by default

>>>>> On Sun, 19 Feb 2012, Ben wrote:

> In my opinion we should set a default environment with the following
> values:

> LANG=en_US.UTF-8
> LC_ALL=
> LC_COLLATE=C

> This offers the best default options to the majority of users, and
> is easy to customize for those who wish to use another locale.

At least, LC_NUMERIC=C should be added to this, otherwise numbers will
be formatted with commas as thousands separators.

Also en_US.UTF-8 for LC_MEASUREMENT and LC_PAPER means imperial units
and letter paper, which isn't optimal for users outside of the U.S.

Ulrich
 
Old 02-19-2012, 02:56 PM
Ben
 
Default LANG=en_GB.UTF-8 by default

On 19 February 2012 23:14, Ulrich Mueller <ulm@gentoo.org> wrote:
>>>>>> On Sun, 19 Feb 2012, Ben *wrote:
>
>> In my opinion we should set a default environment with the following
>> values:
>
>> LANG=en_US.UTF-8
>> LC_ALL=
>> LC_COLLATE=C
>
>> This offers the best default options to the majority of users, and
>> is easy to customize for those who wish to use another locale.
>
> At least, LC_NUMERIC=C should be added to this, otherwise numbers will
> be formatted with commas as thousands separators.
>
> Also en_US.UTF-8 for LC_MEASUREMENT and LC_PAPER means imperial units
> and letter paper, which isn't optimal for users outside of the U.S.
>
> Ulrich
>

I think those users (and that includes myself) should then set LANG to
something more appropriate to their use case.

Ben
 
Old 02-19-2012, 05:44 PM
Kerin Millar
 
Default LANG=en_GB.UTF-8 by default

On 19/02/2012 15:56, Ben wrote:

On 19 February 2012 23:14, Ulrich Mueller<ulm@gentoo.org> wrote:

On Sun, 19 Feb 2012, Ben wrote:



In my opinion we should set a default environment with the following
values:



LANG=en_US.UTF-8
LC_ALL=


LC_ALL isn't needed here because, unlike other LC_* settings, it does
not inherit from LANG and, thus, will be undefined anyway. Although the
above would not directly cause any harm, I am entirely certain that its
mere presence would encourage users to explicitly define it where they
most definitely should not. The misinformation that LC_ALL should be
defined was propagated by the localization doc for rather a long time
and it was rather challenging to impress upon its maintainers that
change was required. Let's not repeat old mistakes.



LC_COLLATE=C



This offers the best default options to the majority of users, and
is easy to customize for those who wish to use another locale.


At least, LC_NUMERIC=C should be added to this, otherwise numbers will
be formatted with commas as thousands separators.

Also en_US.UTF-8 for LC_MEASUREMENT and LC_PAPER means imperial units
and letter paper, which isn't optimal for users outside of the U.S.

Ulrich



I think those users (and that includes myself) should then set LANG to
something more appropriate to their use case.



I agree; the defaults should not be over-engineered. For proper
localisation, set LANG appropriately and done. The real issue is that
locale configuration isn't mentioned in the handbook. It does, however,
mention locale.gen so we're half-way there.


--Kerin
 
Old 02-19-2012, 06:14 PM
Kerin Millar
 
Default LANG=en_GB.UTF-8 by default

On 19/02/2012 01:00, James Cloos wrote:

"KM" == Kerin Millar<kerframil@gmail.com> writes:


KM> Arch also used to define LC_COLLATE="C" by default, probably to
KM> mitigate unpredictable behaviour in some applications, but have
KM> since dropped this additional variable so they must have deemed it
KM> no longer necessary.

Without LC_COLLATE="C" things like [a-z]* gets a false=positive match
on files like Makefile.


Indeed, character classes are a potential minefield. Incidentally, I
just tested Ubuntu and Arch with only LANG set to a UTF-8 locale:-


$ echo Makefile | sed -re 's/[a-z]//g' # collation rules ignored
M

$ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored
akefile

In neither case are the collation rules being obeyed. In Gentoo, however:-

$ echo Makefile | sed -re 's/[a-z]//g' # collation rules obeyed

$ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored
akefile

Obeying the collation rules is ostensibly the correct thing to do but,
until everyone starts using named character classes (which will never
happen), it's not safe. The thing that worries me here is the
inconsistency in Gentoo. LC_COLLATE="C" is sufficient to work around the
issue but the above makes me wonder why we still need it.


--Kerin
 

Thread Tools




All times are GMT. The time now is 05:15 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org