Executive summary: sort thinks that " " < "_" but that "_1" < " 2".
Is this a bug?
Longer version:
I've noticed that the sort command behaves in a way that is surprising
to me. If you feed it the following input:
1
_1
2
_2
consisting of four lines each with two characters, it returns those
lines in the same order. I had assumed that it sorted lines by
comparing the first byte, and only looking to the next byte if
the first bytes agree, but it seems like the second byte can affect
the sort order. Put another way, no matter whether sort considers
" " < "_" or vice versa, I would expect the lines starting with a
space to be grouped together, and those starting with an underscore
to also be grouped together.
If I feed in two lines, with one containing just a space and
one just an underscore, it sorts the space first. But if the
two lines are
_1
2
then it puts the line with the underscore first.
Is this a bug in sort? It's not explained in the man page or the info
page, and I think most people would expect that adding text to the end
of unequal lines shouldn't change their sort order.
I'm using sort 8.5 in coreutils 8.5-1ubuntu3 under maverick,
without any locale environment variables set.
Dan
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
01-20-2012, 10:56 PM
PleegWat
behaviour of sort command
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 01/20/2012 10:57 PM, Dan Christensen wrote:
> Is this a bug in sort? It's not explained in the man page or the
> info page, and I think most people would expect that adding text to
> the end of unequal lines shouldn't change their sort order.
How sort sorts depends on your localization settings (specifically the
value of LC_COLLATE). Example:
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
01-23-2012, 08:40 PM
Dan Christensen
behaviour of sort command
PleegWat <pleegwat@telfort.nl> writes:
> On 01/20/2012 10:57 PM, Dan Christensen wrote:
>> Is this a bug in sort? It's not explained in the man page or the
>> info page, and I think most people would expect that adding text to
>> the end of unequal lines shouldn't change their sort order.
>
> How sort sorts depends on your localization settings (specifically the
> value of LC_COLLATE). Example:
I knew that could change how bytes were compared, but I didn't realize
that it could make sort use later characters on the line instead of
earlier characters! I've searched many man pages and haven't found
any documentation of this. Does anyone know what the algorithm is
that produces this output: