FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 07-19-2012, 09:39 PM
Sascha Cunz
 
Default UTF-8 locale by default

I recently discovered that I for some reason haven't noticed the warning about
setting the locale to utf-8 in the gentoo handbook for obviously several
years; thus i was still running all my systems in a POSIX locale since i never
cared much about it.

However, since I noticed, I talked to several people about it; all of them
stating as first response: "Not shipping with a utf-8 locale turned on by
default nowadays probably is a bug in your distro".

While thinking about this and recognizing that indeed recent distributions
ship with some UTF-8 locale by default, I tend to agree on that statement.

Though, google brings up a lot of good documentation about how to change the
locale, I couldn't find something that tells why stage3 is still delivered
with posix locale set.

Is there a reason for not using at least en_US.UTF-8 as a "sane" default
value?

BR,
SaCu
 
Old 07-19-2012, 10:28 PM
Ulrich Mueller
 
Default UTF-8 locale by default

>>>>> On Thu, 19 Jul 2012, Sascha Cunz wrote:

> Is there a reason for not using at least en_US.UTF-8 as a "sane"
> default value?

Because there's no one-size-fits-all locale, but it is specific to
every system so the user must configure it?

The matter was recently discussed in this mailing list [1] and also in
the March 2012 council meeting [2], and as a result the docs team has
amended the respective section [3] of the handbook.

Ulrich

[1] <http://archives.gentoo.org/gentoo-dev/msg_2ffb7ea72e6209439600c371f6fc071d.xml>
[2] <http://www.gentoo.org/proj/en/council/meeting-logs/20120313.txt>
[3] <http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=1&chap=8>
 
Old 07-27-2012, 06:42 AM
Ben de Groot
 
Default UTF-8 locale by default

On 20 July 2012 06:28, Ulrich Mueller <ulm@gentoo.org> wrote:
>>>>>> On Thu, 19 Jul 2012, Sascha Cunz wrote:
>
>> Is there a reason for not using at least en_US.UTF-8 as a "sane"
>> default value?
>
> Because there's no one-size-fits-all locale, but it is specific to
> every system so the user must configure it?

While this is understandable, the fact remains that not having a
UTF-8 locale by default in our stage3 environment is sub-optimal.

I understand why the council rejected Debian's C.UTF-8 option,
but is there really no better default that we can use?

Without any default locale set, in practically all cases that means
that the user is presented with English, and mostly the American
variant. So, in practice, we are defaulting to en_US, just not in a
unicode environment. Correct me if I'm wrong.

Also, in most other places (such as our website, GLEPs, ebuilds)
we default to en_US.UTF-8.

So let's upgrade to en_US.UTF-8, which is for most users more
desirable than the current situation. Of course we will still advise
them to set their desired locales in /etc/locale.gen. But at least
they will start with a unicode environment, as expected anno 2012.


> The matter was recently discussed in this mailing list [1] and also in
> the March 2012 council meeting [2], and as a result the docs team has
> amended the respective section [3] of the handbook.
>
> Ulrich
>
> [1] <http://archives.gentoo.org/gentoo-dev/msg_2ffb7ea72e6209439600c371f6fc071d.xml>
> [2] <http://www.gentoo.org/proj/en/council/meeting-logs/20120313.txt>
> [3] <http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=1&chap=8>
>

--
Cheers,

Ben | yngwin
Gentoo developer
Gentoo Qt project lead, Gentoo Wiki admin
 
Old 07-27-2012, 07:08 AM
Ulrich Mueller
 
Default UTF-8 locale by default

>>>>> On Fri, 27 Jul 2012, Ben de Groot wrote:

> I understand why the council rejected Debian's C.UTF-8 option,
> but is there really no better default that we can use?

> Without any default locale set, in practically all cases that means
> that the user is presented with English, and mostly the American
> variant. So, in practice, we are defaulting to en_US, just not in a
> unicode environment. Correct me if I'm wrong.

See below. We're not defaulting to en_US for things like the number
format.

> Also, in most other places (such as our website, GLEPs, ebuilds)
> we default to en_US.UTF-8.

> So let's upgrade to en_US.UTF-8, which is for most users more
> desirable than the current situation. Of course we will still advise
> them to set their desired locales in /etc/locale.gen. But at least
> they will start with a unicode environment, as expected anno 2012.

As I had pointed out before [1], changing from POSIX to an en_US
locale will have undesirable side effects, like commas as thousands
separators in numbers (because of LC_NUMERIC). Also the defaults of
en_US for LC_MEASUREMENT and LC_PAPER are only useful in the U.S.

So if we change the default (but I still don't see the need), we
should go for a less intrusive setting like:

LANG="POSIX"
LC_CTYPE="en_US.utf8"

Ulrich

[1] <http://archives.gentoo.org/gentoo-dev/msg_56a438adde8efebd467ada5f858048ba.xml>
 
Old 07-27-2012, 07:19 AM
"Rick "Zero_Chaos" Farina"
 
Default UTF-8 locale by default

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/27/2012 03:08 AM, Ulrich Mueller wrote:
>
> As I had pointed out before [1], changing from POSIX to an en_US
> locale will have undesirable side effects, like commas as thousands
> separators in numbers (because of LC_NUMERIC). Also the defaults of
> en_US for LC_MEASUREMENT and LC_PAPER are only useful in the U.S.
>
> So if we change the default (but I still don't see the need), we
> should go for a less intrusive setting like:
>
> LANG="POSIX"
> LC_CTYPE="en_US.utf8"

I would love to see a utf8 default, if the above is agreeable then I say +1

- -Zero
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJQEkD6AAoJEKXdFCfdEflKt8MP/3wRoExV11rO5aV5952hwKhd
x9AG3wGJQqGFLkKW++gU1RLX8rhxZE+W8cRlp3/4Q1b6yLGFp7UihZv/rQj1SJra
Uz4OWqzzdYAkfkzr2MOgB94iODXInuuSbZmhcvOg8d7cgbhW3p 0aIQ59uqkqom6W
U0a8BohmGtTEMvWurMtvz705atv0z8aRUsoBUkagCUmRqg96j8 HJRbMibNFKcHaa
tzilNblkCouPmh5VZNuoCNIVrs6ADOT+kXmhZ8DeuOOdM88irP r41gz557K97J4l
u9ZWElpLY8zse+dHSioybE57cb9ISNph9B3OjmrzEmxMYO/Vs8+8ZRIgX4A4U2FZ
BDISvf2u77ZUhv48gCuC6pj+np7IMAUgRgk1xWiSkPIWxvlcPc vFo/K1dle3FofL
iNAxf0XcLj+crfBemhnvDWTB0ZCIIBcyn0MYax70lzcwR0t0q+ xJ8XBN1hF3xWob
LOUSCd1sibc2a65D5olc/qKSjINM5KY3D+CVXhojhD1YzklmrKBb9K5gk6ziZr2y
w4OMOIkDc+iHYq0xhcYRAJU38+cuX9ViNq9O4H3ILpQXi+KRKl k4PmlLIm2v9evb
P+JNsRSl+1sxUkn2ZthBh+83vj/WtnR0s1sXEzc+6riBomBGsc0Hbsoa9Z+JgNhF
FzvV5OHsfNiuHvAzayww
=ZiLb
-----END PGP SIGNATURE-----
 
Old 07-27-2012, 08:06 AM
Dan Douglas
 
Default UTF-8 locale by default

On Friday, July 27, 2012 09:08:36 AM Ulrich Mueller wrote:
> >>>>> On Fri, 27 Jul 2012, Ben de Groot wrote:
>
> > I understand why the council rejected Debian's C.UTF-8 option,
> > but is there really no better default that we can use?
>
> > Without any default locale set, in practically all cases that means
> > that the user is presented with English, and mostly the American
> > variant. So, in practice, we are defaulting to en_US, just not in a
> > unicode environment. Correct me if I'm wrong.
>
> See below. We're not defaulting to en_US for things like the number
> format.
>
> > Also, in most other places (such as our website, GLEPs, ebuilds)
> > we default to en_US.UTF-8.
>
> > So let's upgrade to en_US.UTF-8, which is for most users more
> > desirable than the current situation. Of course we will still advise
> > them to set their desired locales in /etc/locale.gen. But at least
> > they will start with a unicode environment, as expected anno 2012.
>
> As I had pointed out before [1], changing from POSIX to an en_US
> locale will have undesirable side effects, like commas as thousands
> separators in numbers (because of LC_NUMERIC). Also the defaults of
> en_US for LC_MEASUREMENT and LC_PAPER are only useful in the U.S.
>
> So if we change the default (but I still don't see the need), we
> should go for a less intrusive setting like:
>
> LANG="POSIX"
> LC_CTYPE="en_US.utf8"
>
> Ulrich
>

You're concerned about the commas breaking things? Given that you usually need
to specifically ask for them (i.e., printf ' flag), and that kind of output is
usually going to be for human consumption only that seems unlikely. If
anything does rely upon the format, can't tolerate different locales, and fails
to specify LC_NUMERIC then it's broken anyway.

LC_MONETARY / LC_MEASUREMENT as en_US are probably slightly more annoying
defaults for some people. What do users of other distros think? Is this really
a serious problem for anyone?

LC_CTYPE=en_US.utf8 would be a bare minimum. The important bit is getting utf8
by default. I can live with LANG=POSIX.
--
Dan Douglas
 
Old 07-27-2012, 08:34 AM
Ben de Groot
 
Default UTF-8 locale by default

On 27 July 2012 16:06, Dan Douglas <ormaaj@gmail.com> wrote:
> On Friday, July 27, 2012 09:08:36 AM Ulrich Mueller wrote:
>> >>>>> On Fri, 27 Jul 2012, Ben de Groot wrote:
>>
>> > I understand why the council rejected Debian's C.UTF-8 option,
>> > but is there really no better default that we can use?
>>
>> > Without any default locale set, in practically all cases that means
>> > that the user is presented with English, and mostly the American
>> > variant. So, in practice, we are defaulting to en_US, just not in a
>> > unicode environment. Correct me if I'm wrong.
>>
>> See below. We're not defaulting to en_US for things like the number
>> format.
>>
>> > Also, in most other places (such as our website, GLEPs, ebuilds)
>> > we default to en_US.UTF-8.
>>
>> > So let's upgrade to en_US.UTF-8, which is for most users more
>> > desirable than the current situation. Of course we will still advise
>> > them to set their desired locales in /etc/locale.gen. But at least
>> > they will start with a unicode environment, as expected anno 2012.
>>
>> As I had pointed out before [1], changing from POSIX to an en_US
>> locale will have undesirable side effects, like commas as thousands
>> separators in numbers (because of LC_NUMERIC). Also the defaults of
>> en_US for LC_MEASUREMENT and LC_PAPER are only useful in the U.S.
>>
>> So if we change the default (but I still don't see the need), we
>> should go for a less intrusive setting like:
>>
>> LANG="POSIX"
>> LC_CTYPE="en_US.utf8"
>>
>> Ulrich
>>
>
> You're concerned about the commas breaking things? Given that you usually need
> to specifically ask for them (i.e., printf ' flag), and that kind of output is
> usually going to be for human consumption only that seems unlikely. If
> anything does rely upon the format, can't tolerate different locales, and fails
> to specify LC_NUMERIC then it's broken anyway.
>
> LC_MONETARY / LC_MEASUREMENT as en_US are probably slightly more annoying
> defaults for some people. What do users of other distros think? Is this really
> a serious problem for anyone?
>
> LC_CTYPE=en_US.utf8 would be a bare minimum. The important bit is getting utf8
> by default. I can live with LANG=POSIX.
> --
> Dan Douglas

How about the below?

LANG=en_GB.utf8
LC_COLLATE=C
LC_CTYPE=en_GB.utf8

That will give us A4 paper size and the metric system. If LC_NUMERIC is
really a problem, we can set it to something more desirable.
--
Cheers,

Ben | yngwin
Gentoo developer
Gentoo Qt project lead, Gentoo Wiki admin
 
Old 07-27-2012, 08:38 AM
Cyprien Nicolas
 
Default UTF-8 locale by default

Ulrich Mueller wrote:
>> On Fri, 27 Jul 2012, Ben de Groot wrote:
>>
>> So let's upgrade to en_US.UTF-8, which is for most users more
>> desirable than the current situation. Of course we will still advise
>> them to set their desired locales in /etc/locale.gen. But at least
>> they will start with a unicode environment, as expected anno 2012.
>
> As I had pointed out before [1], changing from POSIX to an en_US
> locale will have undesirable side effects, like commas as thousands
> separators in numbers (because of LC_NUMERIC). Also the defaults of
> en_US for LC_MEASUREMENT and LC_PAPER are only useful in the U.S.

For this very reason by system locale is en_IE.UTF-8. Still English but
using Euro Monetary, Metric units, A4 paper, etc.

It might suit needs for most European installs, but not for everyone.

--
Cyprien / Fulax
Gentoo Lisp Project contributor
 
Old 07-27-2012, 08:47 AM
Michał Górny
 
Default UTF-8 locale by default

On Fri, 27 Jul 2012 10:38:30 +0200
Cyprien Nicolas <c.nicolas@gmail.com> wrote:

> Ulrich Mueller wrote:
> >> On Fri, 27 Jul 2012, Ben de Groot wrote:
> >>
> >> So let's upgrade to en_US.UTF-8, which is for most users more
> >> desirable than the current situation. Of course we will still
> >> advise them to set their desired locales in /etc/locale.gen. But
> >> at least they will start with a unicode environment, as expected
> >> anno 2012.
> >
> > As I had pointed out before [1], changing from POSIX to an en_US
> > locale will have undesirable side effects, like commas as thousands
> > separators in numbers (because of LC_NUMERIC). Also the defaults of
> > en_US for LC_MEASUREMENT and LC_PAPER are only useful in the U.S.
>
> For this very reason by system locale is en_IE.UTF-8. Still English
> but using Euro Monetary, Metric units, A4 paper, etc.
>
> It might suit needs for most European installs, but not for everyone.

Still uses ',' for thousands sep.

--
Best regards,
Michał Górny
 
Old 07-27-2012, 08:49 AM
Michał Górny
 
Default UTF-8 locale by default

On Fri, 27 Jul 2012 16:34:01 +0800
Ben de Groot <yngwin@gentoo.org> wrote:

> On 27 July 2012 16:06, Dan Douglas <ormaaj@gmail.com> wrote:
> > On Friday, July 27, 2012 09:08:36 AM Ulrich Mueller wrote:
> >> >>>>> On Fri, 27 Jul 2012, Ben de Groot wrote:
> >>
> >> > I understand why the council rejected Debian's C.UTF-8 option,
> >> > but is there really no better default that we can use?
> >>
> >> > Without any default locale set, in practically all cases that
> >> > means that the user is presented with English, and mostly the
> >> > American variant. So, in practice, we are defaulting to en_US,
> >> > just not in a unicode environment. Correct me if I'm wrong.
> >>
> >> See below. We're not defaulting to en_US for things like the number
> >> format.
> >>
> >> > Also, in most other places (such as our website, GLEPs, ebuilds)
> >> > we default to en_US.UTF-8.
> >>
> >> > So let's upgrade to en_US.UTF-8, which is for most users more
> >> > desirable than the current situation. Of course we will still
> >> > advise them to set their desired locales in /etc/locale.gen. But
> >> > at least they will start with a unicode environment, as expected
> >> > anno 2012.
> >>
> >> As I had pointed out before [1], changing from POSIX to an en_US
> >> locale will have undesirable side effects, like commas as thousands
> >> separators in numbers (because of LC_NUMERIC). Also the defaults of
> >> en_US for LC_MEASUREMENT and LC_PAPER are only useful in the U.S.
> >>
> >> So if we change the default (but I still don't see the need), we
> >> should go for a less intrusive setting like:
> >>
> >> LANG="POSIX"
> >> LC_CTYPE="en_US.utf8"
> >>
> >> Ulrich
> >>
> >
> > You're concerned about the commas breaking things? Given that you
> > usually need to specifically ask for them (i.e., printf ' flag),
> > and that kind of output is usually going to be for human
> > consumption only that seems unlikely. If anything does rely upon
> > the format, can't tolerate different locales, and fails to specify
> > LC_NUMERIC then it's broken anyway.
> >
> > LC_MONETARY / LC_MEASUREMENT as en_US are probably slightly more
> > annoying defaults for some people. What do users of other distros
> > think? Is this really a serious problem for anyone?
> >
> > LC_CTYPE=en_US.utf8 would be a bare minimum. The important bit is
> > getting utf8 by default. I can live with LANG=POSIX.
> > --
> > Dan Douglas
>
> How about the below?
>
> LANG=en_GB.utf8
> LC_COLLATE=C
> LC_CTYPE=en_GB.utf8
>
> That will give us A4 paper size and the metric system. If LC_NUMERIC
> is really a problem, we can set it to something more desirable.

LC_NUMERIC=pl_PL.utf8

--
Best regards,
Michał Górny
 

Thread Tools




All times are GMT. The time now is 07:54 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org