Could anyone give me a hint? I know that this is LC_COLLATE related
(LC_ALL as shorter version), but don't know whether it is my fault or
upstream bug.
I'd appreciate any comments.
Regards,
Robert
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTimSt_3JNPkWaHYV81C4=A07UdQo5unnDm47hywc@mail .gmail.com">http://lists.debian.org/AANLkTimSt_3JNPkWaHYV81C4=A07UdQo5unnDm47hywc@mail .gmail.com
11-04-2010, 07:16 PM
Camaleón
Locales/sort bug
On Thu, 04 Nov 2010 20:29:02 +0100, Rob Gom wrote:
> do you think it's a bug in either libc or coreutils (sort)?
>
> $ cat test.csv
> aph3,"APP",""
> aph3_devel,"TXT",""
> aph3,"MiB",""
>
> $ LC_ALL=C sort test.csv # expected
> aph3,"APP",""
> aph3,"MiB",""
> aph3_devel,"TXT",""
>
> $ LC_ALL=pl_PL sort test.csv # why is that? aph3,"APP",""
> aph3_devel,"TXT",""
> aph3,"MiB",""
>
> $ LC_ALL=pl_PL.UTF-8 sort test.csv # another unexpected output
> aph3,"APP",""
> aph3_devel,"TXT",""
> aph3,"MiB",""
>
> Could anyone give me a hint? I know that this is LC_COLLATE related
> (LC_ALL as shorter version), but don't know whether it is my fault or
> upstream bug.
I'm also getting that behaviour (locale set to "es_ES.UTF-8") so I
understand that my locale setting dictates "underscore" ("_") comes first
than "comma" (",") symbol.
As per "man sort" page:
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
Do you think that is a bug? :-?
Greetings,
--
Camaleón
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: pan.2010.11.04.20.16.48@gmail.com">http://lists.debian.org/pan.2010.11.04.20.16.48@gmail.com
11-04-2010, 07:19 PM
Ron Johnson
Locales/sort bug
On 11/04/2010 02:29 PM, Rob Gom wrote:
Hi all,
do you think it's a bug in either libc or coreutils (sort)?
Could anyone give me a hint? I know that this is LC_COLLATE related
(LC_ALL as shorter version), but don't know whether it is my fault or
upstream bug.
I'd appreciate any comments.
While it *might* be an upstream bug, it's unlikely. (The first
thing I learned in my first CompSci class is that it's not the
compiler's fault that my program doesn't work...)
You just don't know what the Polish "ASCII" collating sequence is.
--
Seek truth from facts.
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
[cut]
>
> I'm also getting that behaviour (locale set to "es_ES.UTF-8") so I
> understand that my locale setting dictates "underscore" ("_") comes first
> than "comma" (",") symbol.
>
> As per "man sort" page:
>
> *** WARNING *** The locale specified by the environment affects sort
> order. Set LC_ALL=C to get the traditional sort order that uses native
> byte values.
>
> Do you think that is a bug? :-?
>
> Greetings,
>
> --
> Camaleón
If so, why do I get order comma, underscore, comma? Even better,
comma+quote+A, underscore+d,comma+quote+M. I don't get it...
Regards,
Robert
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTi=VM8um=jxkziGySYTyzd97F39YsynofBHHC8d5@mail .gmail.com">http://lists.debian.org/AANLkTi=VM8um=jxkziGySYTyzd97F39YsynofBHHC8d5@mail .gmail.com
11-04-2010, 07:25 PM
Sven Joachim
Locales/sort bug
On 2010-11-04 20:29 +0100, Rob Gom wrote:
> Hi all,
> do you think it's a bug in either libc or coreutils (sort)?
>
> $ cat test.csv
> aph3,"APP",""
> aph3_devel,"TXT",""
> aph3,"MiB",""
>
> $ LC_ALL=C sort test.csv # expected
> aph3,"APP",""
> aph3,"MiB",""
> aph3_devel,"TXT",""
>
> $ LC_ALL=pl_PL sort test.csv # why is that?
> aph3,"APP",""
> aph3_devel,"TXT",""
> aph3,"MiB",""
>
> $ LC_ALL=pl_PL.UTF-8 sort test.csv # another unexpected output
> aph3,"APP",""
> aph3_devel,"TXT",""
> aph3,"MiB",""
>
> Could anyone give me a hint? I know that this is LC_COLLATE related
> (LC_ALL as shorter version), but don't know whether it is my fault or
> upstream bug.
>
> I'd appreciate any comments.
This is covered by the coreutils FAQ:
http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
Sven
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87lj58n8em.fsf@turtle.gmx.de">http://lists.debian.org/87lj58n8em.fsf@turtle.gmx.de
11-04-2010, 07:43 PM
Rob Gom
Locales/sort bug
[cut]
>
> This is covered by the coreutils FAQ:
> http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
>
> Sven
>
Thanks for all the answers.
How could I know that collate is defined correctly? I understand
LC_COLLATE influence on sort operation, but I am not sure if this is
ok.
The simpliest example which causes weird behaviour is:
$ cat test2.csv
,"A
_d
,"M
$ LC_ALL=pl_PL sort test2.csv # and many other LC_COLLATE variants,
other than C/POSIX
,"A
_d
,"M
In order to achieve such behaviour, ',"' should be defined as single
entity in collate definition, equal in ordering to '_'. I don't have
other explanation for that. Unfortunately, I am not good enough to
understand/verify collate definition in /usr/share/i18n
Regards,
Robert
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTik3FUbMR0OcLOChJcOsyQwHCAHgSjvOhTFrjH4e@mail .gmail.com">http://lists.debian.org/AANLkTik3FUbMR0OcLOChJcOsyQwHCAHgSjvOhTFrjH4e@mail .gmail.com
11-04-2010, 07:55 PM
Rob Gom
Locales/sort bug
One more thing.
If I specify LC_COLLATE to C/POSIX, special characters sorting looks
fine, but I lose Polish characters ordering.
If I specify LC_COLLATE to pl_PL.UTF-8, Polish characters ordering is
fine, but sorting goes crazy with special characters.
Is it possible to retain both features then?
carramba@laptop-rg:/tmp$ cat test2.csv
,"A
_d
,"M
a
Ä…
b
ż
ć
z
carramba@laptop-rg:/tmp$ LC_ALL=POSIX sort test2.csv
,"A
,"M
_d
a
b
z
Ä…
ć
ż
# above - correct special characters, Polish in wrong order
carramba@laptop-rg:/tmp$ LC_ALL=pl_PL.UTF-8 sort test2.csv
a
,"A
Ä…
b
ć
_d
,"M
z
ż
# above - correct Polish characters order, incorrect special characters
Feel free to replace 'correct' with 'expected' in my posts, I'm just
trying to understand what's under the hood.
Regards,
Robert
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTi=pu2NP+fiLSqx6Vcjj32sqNYAJQpOPQw33vpC2@mail .gmail.com">http://lists.debian.org/AANLkTi=pu2NP+fiLSqx6Vcjj32sqNYAJQpOPQw33vpC2@mail .gmail.com
11-04-2010, 08:06 PM
Rob Gom
Locales/sort bug
I have some form of workaround.
When I know sort field separator (which was the case in my original
example), I can use that to overcome the limitations with:
My conclusion for now would be:
- if you don't know field separator
-- if there are only ASCII characters - use POSIX collate
-- if there are different characters (i18n) - don't have solution
- if you know field separator
-- specify it in sort command
Regards,
Robert
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTinLoxDzJ9hdAfvbCQ8++J5V0Jecf5WKHkSy46R3@mail .gmail.com">http://lists.debian.org/AANLkTinLoxDzJ9hdAfvbCQ8++J5V0Jecf5WKHkSy46R3@mail .gmail.com