FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu User

 
 
LinkBack Thread Tools
 
Old 01-20-2012, 08:57 PM
Dan Christensen
 
Default behaviour of sort command

Executive summary: sort thinks that " " < "_" but that "_1" < " 2".
Is this a bug?

Longer version:

I've noticed that the sort command behaves in a way that is surprising
to me. If you feed it the following input:

1
_1
2
_2

consisting of four lines each with two characters, it returns those
lines in the same order. I had assumed that it sorted lines by
comparing the first byte, and only looking to the next byte if
the first bytes agree, but it seems like the second byte can affect
the sort order. Put another way, no matter whether sort considers
" " < "_" or vice versa, I would expect the lines starting with a
space to be grouped together, and those starting with an underscore
to also be grouped together.

If I feed in two lines, with one containing just a space and
one just an underscore, it sorts the space first. But if the
two lines are

_1
2

then it puts the line with the underscore first.

Is this a bug in sort? It's not explained in the man page or the info
page, and I think most people would expect that adding text to the end
of unequal lines shouldn't change their sort order.

I'm using sort 8.5 in coreutils 8.5-1ubuntu3 under maverick,
without any locale environment variables set.

Dan


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 01-20-2012, 10:56 PM
PleegWat
 
Default behaviour of sort command

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/20/2012 10:57 PM, Dan Christensen wrote:
> Is this a bug in sort? It's not explained in the man page or the
> info page, and I think most people would expect that adding text to
> the end of unequal lines shouldn't change their sort order.

How sort sorts depends on your localization settings (specifically the
value of LC_COLLATE). Example:

$ LC_COLLATE=en_US.UTF-8
$ echo -e " 1
_1
2
_2" | sort
1
_1
2
_2
$ LC_COLLATE=C
$ echo -e " 1
_1
2
_2" | sort
1
2
_1
_2

Typically LC_COLLATE=C corresponds to sorting the actual byte order.

See `man locale`

PleegWat
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPGf81AAoJEAM6sLJjDJaMwDcIAJbqwA/QWzn6KIAwqTa7nfta
9hUxGUr368Lluu27Dy8luU0iFP+Gx88P9ArPAOM7dVsbb4yMNJ 58bSXd05d5yNfZ
sIvUW8WEuazoH1xKDuXs7PSHFB6tQFxXGht4ErjwRtAamPDgF4 a7HcSeusHkBdbY
wsBmN/OifVz9AAVkriNz4q/3aI4EOICd+a7C926IREl3R+AWUeHZOiq5mGQMhVtK
3U0D9a5sHGGiF8yj5i/ag4DDROutHwICbOF4TuusrXTmkYUT+Nyltj5GW4lsqJrH
q1jzx3yEUlSh/59E18LMooUWTNGbHpKGCUBV6DIwsRniGVwrynpaganupJecxCE =
=uB6l
-----END PGP SIGNATURE-----

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 01-23-2012, 08:40 PM
Dan Christensen
 
Default behaviour of sort command

PleegWat <pleegwat@telfort.nl> writes:

> On 01/20/2012 10:57 PM, Dan Christensen wrote:
>> Is this a bug in sort? It's not explained in the man page or the
>> info page, and I think most people would expect that adding text to
>> the end of unequal lines shouldn't change their sort order.
>
> How sort sorts depends on your localization settings (specifically the
> value of LC_COLLATE). Example:

I knew that could change how bytes were compared, but I didn't realize
that it could make sort use later characters on the line instead of
earlier characters! I've searched many man pages and haven't found
any documentation of this. Does anyone know what the algorithm is
that produces this output:

> $ LC_COLLATE=en_US.UTF-8
> $ echo -e " 1
_1
2
_2" | sort
> 1
> _1
> 2
> _2

Thanks,

Dan


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 

Thread Tools




All times are GMT. The time now is 02:45 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org