FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 07-29-2010, 11:16 PM
Arfrever Frehtes Taifersar Arahesis
 
Default Locale check in python_pkg_setup()

We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check
locale and print error (using eerror(), without die()), when unsupported locale has been detected.

--
Arfrever Frehtes Taifersar Arahesis
 
Old 07-29-2010, 11:20 PM
"Paweł Hajdan, Jr."
 
Default Locale check in python_pkg_setup()

On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
>
> --- python.eclass
> +++ python.eclass
> @@ -355,6 +355,8 @@
> # Check if phase is pkg_setup().
> [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
>
> + local locale
> +
> if [[ "$#" -ne 0 ]]; then
> die "${FUNCNAME}() does not accept arguments"
> fi
> @@ -407,6 +409,16 @@
> unset -f python_pkg_setup_check_USE_flags
> fi

nit: Why not declare "local locale" here, close to its usage?

> + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> + if [[ "${locale}" != *.UTF-8 ]]; then
> + eerror
> + eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
> + eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
> + eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
> + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
> + eerror
> + fi
> +
> PYTHON_PKG_SETUP_EXECUTED="1"
> }
>
 
Old 07-30-2010, 12:13 AM
Jonathan Callen
 
Default Locale check in python_pkg_setup()

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 07/29/2010 07:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> + if [[ "${locale}" != *.UTF-8 ]]; then

Shouldn't you be checking the output of `locale charmap` instead of the
actual contents of the LC_ALL/LC_CTYPE/LANG variables? You currently
are reporting an error if someone is using the "en_US.utf8" locale
(which *is* a legal UTF-8 locale, and should not be an error).

- --
Jonathan Callen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJMUhkgAAoJELHSF2kinlg4dwYQAKkGNSIQJR +2If0c97MSTWZz
hj5UAUrj+hYsxdg4rjOt/J6rGdh/iG+v1OwzaGZy0GZpb3O/KKajrfbYDaarGXMp
RwHviKOh+jVZqnaCKF63Iz4F80BaEJpvuQBfU0zrwRVlvl5nVS 9HaOuwXslFKFJr
ge4ygrsRkKWqenaVZbjvWnYWeFWxMHF3iGH77uWrAci04cDArJ jNX6puCKiwCMEt
F/+aXro7DqkyZws084L1xjovfWs9HcbdkGPMsQ5TR48MqRIDRDux KiNoRhRQoDjM
qSUKR8FpZtTcrXyIsPrZw85f2XAAsXXdW6aMwVcpj9rS7JxNeM 8/383Z5A+i/za2
iyynZcBhZj1jYOWtghCvfOeKHdO+s6iBPRg/yN7WAashiS6cCa+hBwXeHT1YDw1V
iXSKfSKQnPcT1sqXqtZ7IkLKvXxG0PTNIrpIJya7SXCKTlZP97 E6uVZcJeYYOT3Y
sN0FqCxJ7F7SIRndfC4Q9fxU6wxcNICoB6VF1jkpyYccO7XyjF qL9zNfd9+2Pe6u
hqAVZpae7GbE5NJJnkWvb7fQj0PVdhlk54dUdr9p5cinKnfV2h PW+23lSInpkgdw
Oa1ZMUy1G9+lEUsCN2ve/l3gfuUAWXeZx/Nuo6ieuJ/HJLFkAn9Cbbpy9C+VlkxN
K2S4CEu16mDy9zgrrbq+
=w5tr
-----END PGP SIGNATURE-----
 
Old 07-30-2010, 02:29 AM
Arfrever Frehtes Taifersar Arahesis
 
Default Locale check in python_pkg_setup()

2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a):
> On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> >
> > --- python.eclass
> > +++ python.eclass
> > @@ -355,6 +355,8 @@
> > # Check if phase is pkg_setup().
> > [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
> >
> > + local locale
> > +
> > if [[ "$#" -ne 0 ]]; then
> > die "${FUNCNAME}() does not accept arguments"
> > fi
> > @@ -407,6 +409,16 @@
> > unset -f python_pkg_setup_check_USE_flags
> > fi
>
> nit: Why not declare "local locale" here, close to its usage?

It's consistent with style used in python.eclass.

> > + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> > + if [[ "${locale}" != *.UTF-8 ]]; then
> > + eerror
> > + eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
> > + eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
> > + eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
> > + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
> > + eerror
> > + fi
> > +
> > PYTHON_PKG_SETUP_EXECUTED="1"
> > }
> >

--
Arfrever Frehtes Taifersar Arahesis
 
Old 07-30-2010, 02:32 AM
Arfrever Frehtes Taifersar Arahesis
 
Default Locale check in python_pkg_setup()

2010-07-30 02:13:20 Jonathan Callen napisał(a):
> On 07/29/2010 07:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> > + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> > + if [[ "${locale}" != *.UTF-8 ]]; then
>
> Shouldn't you be checking the output of `locale charmap` instead of the
> actual contents of the LC_ALL/LC_CTYPE/LANG variables? You currently
> are reporting an error if someone is using the "en_US.utf8" locale
> (which *is* a legal UTF-8 locale, and should not be an error).

OK. I will check output of `locale charmap`, but the actual locale is more useful in error message.

--
Arfrever Frehtes Taifersar Arahesis
 
Old 07-30-2010, 02:36 AM
Brian Harring
 
Default Locale check in python_pkg_setup()

On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> --- python.eclass
> +++ python.eclass
> @@ -355,6 +355,8 @@
> # Check if phase is pkg_setup().
> [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
>
> + local locale
> +
> if [[ "$#" -ne 0 ]]; then
> die "${FUNCNAME}() does not accept arguments"
> fi
> @@ -407,6 +409,16 @@
> unset -f python_pkg_setup_check_USE_flags
> fi
>
> + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"

You're using python to get the exported env. Don't. Use bash (you're
invoking python from freaking bash after all)...

> + if [[ "${locale}" != *.UTF-8 ]]; then
> + eerror
> + eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
> + eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
> + eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
> + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
> + eerror

For cases such as this, ewarn, not eerror. It's not an actual error,
it's a potential source of problems people may see.

The more I look into this issue, the more I'm convinced it's not user
settings that are problem- the problem is in the code, not in user
env. You've stated in a couple of places that "C/Posix locales are
not supported", which frankly is very whacked- that's not really a
proclamation you can make on your own for python, and you're actually
ignoring that this problem would just as easily rear it's head with a
latin-1 encoded file.


Take a look at 302425; the traceback in that is a classic example of
where they *should* be using bytes mode (they don't need to interpret
the data, just write the script across, thus bytes).

bug 328047 is induced by a patch we add (it's not in upstream python).
The code in question also is invoking fricking ldd a few steps prior
which is questionable in multiple ways: either way, relevant chunk is
+ os.system("ldd %s > %s" % (do_readline, tmpfile))
+ fp = open(tmpfile)
+ for ln in fp:

So... roughly, it invokes os.system, which will pass the environment
straight through to it, meaning locale gets passed down.

Then it open's the file. Note it specifes *NO ENCODING* nor is their
actually an enforced locale best I can tell , thus ascii being the
default. The screwup here is in our patches- said patches should be
forcing posix locale for the ldd call (resulting in ascii). If you
think through this bug, we've seen this multiple times in grep/sed
calls- this is literally no different.

bug 287439 is a screw up in the programs source... should've been
using bytes (non arguable). Matter of fact, while generally I think
Tarek knows what the hell he's doing, the skip they added to the
tests ignored an actual valid bug in setuptools/distribute- shebangs
from the standpoint of the kernel need to be consistant. Thus reading
the shebang line itself should be done in bytes, than converted to
ascii and interpretted- they tried opening the file (in whole) in
bytes, meaning they tried enforcing ascii across the whole buffer-
not just the first line. Program bug.

These bugs I got via searching for 'ALL python locale', and
identifying the ones that were actually locale related. I've at this
point looked into the source of 3 bugs- meaning literally, 3 bugs
checked into, 3 instances where the code was wrong.

I'll leave it as an exercise for others to keep digging, but the point
here is that the programs themselves screwup their locale handling-
trying to force all systems to use a utf-8 locale for the env is just
a hack instead of fixing the actual issue. A pretty bad hack
considering I've spent all of 30 minutes digging into this and rooting
out the actual flaws in the src I might add.

For shits and giggles, lets add one more bug in- one that has the
potential of rearing its head in random consuming pkgs, bug 322425
(docutils's build_html being flawed), their encoding handling is
intrinsically flawed. The encoding of a file their
installing/parsing should be determined by the file itself- not
attempting to arbitrarily force it to whatever locale the user happens
to be running (which is exactly the first thing buildhtml.py attempts,
literally `locale.setlocale(locale.LC_ALL, ')` at line 20). The
issue is not people using ascii locales, the issue is that these tools
do not handle encoding correctly.

Recall, one of the purposes of py3k going bytes vs text (aka unicode)
was to make clear that textual data's encoding need be known. All of
this code isn't actually forcing/handling the encoding for the data
they deal in- meaning these are literal bugs, exposed purely due to
py3k actually enforcing encoding in normal file opens.

So... this is a big -1 on adding such a warning (especially
considering it doesn't actually resolve the raw issues, it just
sidesteps a couple of cases).

Fix the actual problem instead...

Finally, cc'ing QA since this is a class of bugs they should be aware
of with py3k. This is a bit of a sign that a lot of source isn't
really py3k ready yet either imo, but so it goes...

~harring
 
Old 07-30-2010, 03:05 AM
"Paweł Hajdan, Jr."
 
Default Locale check in python_pkg_setup()

On 7/29/10 7:29 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> 2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a):
>> nit: Why not declare "local locale" here, close to its usage?
> It's consistent with style used in python.eclass.

Fine for me then. Thanks for explaining.

Paweł
 
Old 07-30-2010, 03:15 AM
Krzysztof Pawlik
 
Default Locale check in python_pkg_setup()

On 07/30/10 01:16, Arfrever Frehtes Taifersar Arahesis wrote:
> We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check
> locale and print error (using eerror(), without die()), when unsupported locale has been detected.

ewarn then instead of eerror - both are nicely visible, and you're actually
*warning* against potential issues.

> + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for
> information on how to fix locale."

I'm with Brian on this one - my locale (C/POSIX) is not broken, it's the code
that has bugs. Can you please change wording here to read something along "...
for information on switching to Unicode locale." instead of suggesting that
users locale is broken.

--
Krzysztof Pawlik <nelchael at gentoo.org> key id: 0xF6A80E46
desktop-misc, java, apache, ppc, vim, kernel, python...
 
Old 07-30-2010, 03:48 AM
Brian Harring
 
Default Locale check in python_pkg_setup()

On Fri, Jul 30, 2010 at 05:15:19AM +0200, Krzysztof Pawlik wrote:
> On 07/30/10 01:16, Arfrever Frehtes Taifersar Arahesis wrote:
> > + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for
> > information on how to fix locale."
>
> I'm with Brian on this one - my locale (C/POSIX) is not broken, it's the code
> that has bugs. Can you please change wording here to read something along "...
> for information on switching to Unicode locale." instead of suggesting that
> users locale is broken.

From where I'm sitting, the only ebuild that has any business telling
me to change (or suggesting how) locale is glibc. Especially when
we're talking about a warning that will be in 7.6% of the versions
in the tree.

That's pretty freaking spammy... end result will be people switching
(for better or worse) to stop seeing the complaints.

It's basically annoying people into changing to partially
sidestep a couple of bugs, instead of fixing the issue- and that's the
wrong course of action.

~brian
 
Old 07-30-2010, 04:05 PM
Harald van Dijk
 
Default Locale check in python_pkg_setup()

On Fri, Jul 30, 2010 at 01:16:18AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check
> locale and print error (using eerror(), without die()), when unsupported locale has been detected.

I'm strongly with Brian on this. You receive too many valid bug reports
caused by a broken package. python_pkg_setup needs to do nothing. You
need to fix the bugs, or if fixing them is too much of an issue, work
around them in the ebuild. Keep in mind that having no locale explicitly
selected is the default for a Gentoo installation, and that the docs do
not (and should not) say anywhere that non-UTF-8 locales are unsupported.
In fact, quoting from
<http://www.gentoo.org/doc/en/guide-localization.xml>:

"It's also possible, and pretty common especially in a more traditional
UNIX environment, to leave the global settings unchanged, i.e. in the
"C" locale. Users can still specify their preferred locale in their own
shell RC file:"
 

Thread Tools




All times are GMT. The time now is 05:33 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org