LANG=en_GB.UTF-8 by default
as subject says could gentoo change the policy and set an UTF-8 environment by
default? http://www.gentoo.org/doc/en/utf-8.xml how to do it very well but having it already set could have the following two advantages: 1) well utf-8 is everywhere, even the linux weekly newsletter has it in 2012 2) the user need to change, not to create a /etc/env.d/XX-lc, creating a standard place where every gentoo install has this settings. contra? P.S. would be nice to have a wd_WD.UTF-8 with WD standing for world, just a country is so 1900 |
LANG=en_GB.UTF-8 by default
On Wed, Feb 15, 2012 at 12:58:52PM +0100, Francesco R.(vivo) wrote:
> as subject says could gentoo change the policy and set an UTF-8 environment by > default? > > http://www.gentoo.org/doc/en/utf-8.xml how to do it very well but having it > already set could have the following two advantages: > > 1) well utf-8 is everywhere, even the linux weekly newsletter has it in 2012 > 2) the user need to change, not to create a /etc/env.d/XX-lc, creating a > standard place where every gentoo install has this settings. > > contra? > > P.S. would be nice to have a wd_WD.UTF-8 with WD standing for world, just a > country is so 1900 > wd_WD.UTF-8 is certainly a no go. WD doesn't match any ISO country code. To support it, we'd have to create the necessary supporting files and that would lead to a lot of work and headaches trying to determine what should be where in what order, et cetera. All of the files we create (ebuilds, initscripts) are UTF-8 in accordance with GLEP 31. So, the issue would be with upstream projects not using UTF-8 for their files. However, the stage 3, last time I used it, didn't default to a UTF-8 environment, and it didn't default to using and/or including a capable UTF-8 font. It is something I think we should look at changing. -- Mr. Aaron W. Swenson Gentoo Linux Developer, Proxy Committer Email : titanofold@gentoo.org GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0 GnuPG ID : D1BBFDA0 |
LANG=en_GB.UTF-8 by default
On 15/02/2012 12:22, Mr. Aaron W. Swenson wrote:
On Wed, Feb 15, 2012 at 12:58:52PM +0100, Francesco R.(vivo) wrote: as subject says could gentoo change the policy and set an UTF-8 environment by default? Perhaps it should define LANG="en_US.UTF-8" as a reasonable default, which would be in line with other notable distros. Arch also used to define LC_COLLATE="C" by default, probably to mitigate unpredictable behaviour in some applications, but have since dropped this additional variable so they must have deemed it no longer necessary. I think that having a default configuration file would also raise awareness of the importance of locale configuration and make it less likely that users configure their systems inappropriately (defining LC_ALL, for instance). P.S. would be nice to have a wd_WD.UTF-8 with WD standing for world, just a country is so 1900 Different countries/regions have different standards and conventions for character classification, case conversion, date/numerical/currency formatting etc. There's no basis on which to formally standardise a world-wide definition. However, the stage 3, last time I used it, didn't default to a UTF-8 environment, and it didn't default to using and/or including a capable UTF-8 font. It is something I think we should look at changing. Yet "unicode" is a default flag in the standard profiles. Most console fonts have poor coverage. The best one I've found thus far is "LatCyrGr-16" from fonty-rg, which provides good Latin and Cyrillic coverage along with some Greek and esoteric punctuation characters. Using this font, I've yet to find any developer's name that doesn't render as expected while perusing the contents of the portage tree. Being a 512 character font, one loses bold support unless using a framebuffer console. Given that the default console fonts aren't especially useful, it seems a small price to pay. --Kerin |
LANG=en_GB.UTF-8 by default
>>>>> "KM" == Kerin Millar <kerframil@gmail.com> writes:
KM> Arch also used to define LC_COLLATE="C" by default, probably to KM> mitigate unpredictable behaviour in some applications, but have KM> since dropped this additional variable so they must have deemed it KM> no longer necessary. Without LC_COLLATE="C" things like [a-z]* gets a false=positive match on files like Makefile. I recently noticed a bug on b.g.o where the ebuild has something like doc/[A-Z]* expecting that it will not match doc/some_lowercase_subdir. The bug, of course, is that glibc fraudulently defaults the latin, greek and cyrillic locales to case-insensitive. The real fix is to have root be C.UTF-8. Which differs from C only in that the charset is utf-8. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 |
LANG=en_GB.UTF-8 by default
On 19 February 2012 09:00, James Cloos <cloos@jhcloos.com> wrote:
> Without LC_COLLATE="C" things like [a-z]* gets a false=positive match > on files like Makefile. [...] > > The real fix is to have root be C.UTF-8. *Which differs from C only in > that the charset is utf-8. In my opinion we should set a default environment with the following values: LANG=en_US.UTF-8 LC_ALL= LC_COLLATE=C This offers the best default options to the majority of users, and is easy to customize for those who wish to use another locale. And yes, LC_ALL needs to be empty, because it would override the other LC_* values. This should be combined with some good unicode fonts, such as the LatCyrGr-16 for console, and dejavu for X. Cheers, Ben |
LANG=en_GB.UTF-8 by default
Excerpts from Ben's message of 2012-02-19 03:04:19 +0100:
> On 19 February 2012 09:00, James Cloos <cloos@jhcloos.com> wrote: > > Without LC_COLLATE="C" things like [a-z]* gets a false=positive > > match on files like Makefile. [...] > > > > The real fix is to have root be C.UTF-8. Â*Which differs from C only > > in that the charset is utf-8. > > In my opinion we should set a default environment with the following > values: > > LANG=en_US.UTF-8 > LC_ALL= > LC_COLLATE=C This is only on my setups or this is "xy_XY.utf8" instead of "xy_XY.UTF-8"? -- Amadeusz Å»oÅ‚nowski |
LANG=en_GB.UTF-8 by default
>>>>> On Sun, 19 Feb 2012, Ben wrote:
> In my opinion we should set a default environment with the following > values: > LANG=en_US.UTF-8 > LC_ALL= > LC_COLLATE=C > This offers the best default options to the majority of users, and > is easy to customize for those who wish to use another locale. At least, LC_NUMERIC=C should be added to this, otherwise numbers will be formatted with commas as thousands separators. Also en_US.UTF-8 for LC_MEASUREMENT and LC_PAPER means imperial units and letter paper, which isn't optimal for users outside of the U.S. Ulrich |
LANG=en_GB.UTF-8 by default
On 19 February 2012 23:14, Ulrich Mueller <ulm@gentoo.org> wrote:
>>>>>> On Sun, 19 Feb 2012, Ben *wrote: > >> In my opinion we should set a default environment with the following >> values: > >> LANG=en_US.UTF-8 >> LC_ALL= >> LC_COLLATE=C > >> This offers the best default options to the majority of users, and >> is easy to customize for those who wish to use another locale. > > At least, LC_NUMERIC=C should be added to this, otherwise numbers will > be formatted with commas as thousands separators. > > Also en_US.UTF-8 for LC_MEASUREMENT and LC_PAPER means imperial units > and letter paper, which isn't optimal for users outside of the U.S. > > Ulrich > I think those users (and that includes myself) should then set LANG to something more appropriate to their use case. Ben |
LANG=en_GB.UTF-8 by default
On 19/02/2012 15:56, Ben wrote:
On 19 February 2012 23:14, Ulrich Mueller<ulm@gentoo.org> wrote: On Sun, 19 Feb 2012, Ben wrote: In my opinion we should set a default environment with the following values: LANG=en_US.UTF-8 LC_ALL= LC_ALL isn't needed here because, unlike other LC_* settings, it does not inherit from LANG and, thus, will be undefined anyway. Although the above would not directly cause any harm, I am entirely certain that its mere presence would encourage users to explicitly define it where they most definitely should not. The misinformation that LC_ALL should be defined was propagated by the localization doc for rather a long time and it was rather challenging to impress upon its maintainers that change was required. Let's not repeat old mistakes. LC_COLLATE=C This offers the best default options to the majority of users, and is easy to customize for those who wish to use another locale. At least, LC_NUMERIC=C should be added to this, otherwise numbers will be formatted with commas as thousands separators. Also en_US.UTF-8 for LC_MEASUREMENT and LC_PAPER means imperial units and letter paper, which isn't optimal for users outside of the U.S. Ulrich I think those users (and that includes myself) should then set LANG to something more appropriate to their use case. I agree; the defaults should not be over-engineered. For proper localisation, set LANG appropriately and done. The real issue is that locale configuration isn't mentioned in the handbook. It does, however, mention locale.gen so we're half-way there. --Kerin |
LANG=en_GB.UTF-8 by default
On 19/02/2012 01:00, James Cloos wrote:
"KM" == Kerin Millar<kerframil@gmail.com> writes: KM> Arch also used to define LC_COLLATE="C" by default, probably to KM> mitigate unpredictable behaviour in some applications, but have KM> since dropped this additional variable so they must have deemed it KM> no longer necessary. Without LC_COLLATE="C" things like [a-z]* gets a false=positive match on files like Makefile. Indeed, character classes are a potential minefield. Incidentally, I just tested Ubuntu and Arch with only LANG set to a UTF-8 locale:- $ echo Makefile | sed -re 's/[a-z]//g' # collation rules ignored M $ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored akefile In neither case are the collation rules being obeyed. In Gentoo, however:- $ echo Makefile | sed -re 's/[a-z]//g' # collation rules obeyed $ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored akefile Obeying the collation rules is ostensibly the correct thing to do but, until everyone starts using named character classes (which will never happen), it's not safe. The thing that worries me here is the inconsistency in Gentoo. LC_COLLATE="C" is sufficient to work around the issue but the above makes me wonder why we still need it. --Kerin |
| All times are GMT. The time now is 08:03 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.