FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 04-12-2008, 04:57 PM
Bhasker C V
 
Default Uniq is not unique ?

Hi,



For fairly large file 100K+ lines
uniq command does not filter the repetitive lines.

Am I doing anything wrong on the usage ?

For eg:-

I had run this script in my home dir

find . -name * -type f -exec basename {} ; | uniq
or send the output to a file and then run uniq on the file

Both cases, the o/p shows repeated lines


--
Bhasker C V
Registered Linux user: #306349 (counter.li.org)
The box said "Requires Windows 95, NT, or better", so I installed Linux.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 04-12-2008, 05:14 PM
Eduardo M KALINOWSKI
 
Default Uniq is not unique ?

Bhasker C V wrote:
> Hi,
>
>
>
> For fairly large file 100K+ lines
> uniq command does not filter the repetitive lines.
>
> Am I doing anything wrong on the usage ?
>
> For eg:-
>
> I had run this script in my home dir
>
> find . -name * -type f -exec basename {} ; | uniq
> or send the output to a file and then run uniq on the file
>
> Both cases, the o/p shows repeated lines
>From man uniq(1):

DESCRIPTION
Discard all but one of successive identical lines from INPUT (or
stan-
dard input), writing to OUTPUT (or standard output).

And, later:

Note: ’uniq’ does not detect repeated lines unless they are
adjacent.
You may want to sort the input first, or use ‘sort -u’ without
‘uniq’.

Since find will output the names in no particular order, you'll have to
sort first.


--
America works less, when you say "Union Yes!"

Eduardo M KALINOWSKI
ekalin@gmail.com
http://move.to/hpkb


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 04-12-2008, 05:16 PM
"Chris Henry"
 
Default Uniq is not unique ?

Hi,

On Sun, Apr 13, 2008 at 12:57 AM, Bhasker C V <bhasker@unixindia.com> wrote:
> For fairly large file 100K+ lines
> uniq command does not filter the repetitive lines.
>
> Am I doing anything wrong on the usage ?
>
> For eg:-
>
> I had run this script in my home dir
>
> find . -name * -type f -exec basename {} ; | uniq
> or send the output to a file and then run uniq on the file
>
> Both cases, the o/p shows repeated lines
I happened to know the source code for uniq and it should filter
repeated lines. By repeated lines, do you mean consecutive repeated
lines or separated by other lines? Uniq only filters consecutive
repeated lines, e.g.

A
A
B
A

will become

A
B
A

If you need it to filter such that only 1 unique line remains, you
will need to sort first then pipe to uniq (not a good solution for
really large files).

Regards,
Chris
>
>
> --
> Bhasker C V
> Registered Linux user: #306349 (counter.li.org)
> The box said "Requires Windows 95, NT, or better", so I installed Linux.
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>



--
contact: +65 97553292
e-mail: chrishenry.ni@gmail.com / ch_music@yahoo.com / chrishenry@nus.edu.sg
facebook: http://nus.facebook.com/profile.php?id=502687583


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 04-12-2008, 05:55 PM
Allan Wind
 
Default Uniq is not unique ?

On 2008-04-12T22:27:46+0530, Bhasker C V wrote:
> For fairly large file 100K+ lines
> uniq command does not filter the repetitive lines.

If you need to sort it anyways then `sort -u` might be of interest.


/Allan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 08-14-2008, 08:55 PM
Urs Thuermann
 
Default Uniq is not unique ?

"Chris Henry" <chrishenry.ni@gmail.com> writes:

> Uniq only filters consecutive repeated lines, e.g.
>
> A
> A
> B
> A
>
> will become
>
> A
> B
> A
>
> If you need it to filter such that only 1 unique line remains, you
> will need to sort first then pipe to uniq (not a good solution for
> really large files).

I sometimes need to filter repeated lines that are not consecutive,
and I use the following simple perl script for this purpose. Runs
reasonable fast even for large (couple of tens of MB) files:

#!/usr/bin/perl

while (<>) {
if (!$h{$_}) {
$h{$_} = 1;
print;
}
}

HTH,
urs


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 01:03 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org