FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 05-11-2010, 08:09 AM
hadi motamedi
 
Default Text file manipulation in CentOS?

Dear All
From my previous posts , I learned from you to make use of 'sort' , 'grep' , and 'grep -v' to manipulate text files . At now, I have generated a large text file from my autoexpect script. To be more specific, I need to find how many distinct records are there in say column#1? How can I filter out the distinct records with number of occurances less than a pre-determined threshold?Can you please show my the power of CentOS in manipulating text files?

Thank you


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-11-2010, 08:14 AM
 
Default Text file manipulation in CentOS?

Can you sample input and expected result.


Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: hadi motamedi <motamedi24@gmail.com>
Date: Tue, 11 May 2010 09:09:23
To: CentOS mailing list<centos@centos.org>
Subject: [CentOS] Text file manipulation in CentOS?

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-11-2010, 08:25 AM
 
Default Text file manipulation in CentOS?

>>To be more specific, I need to find how many distinct records are there in say column#1?

awk '{print $1}' filename | sort -u | wc -l

This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter)

>> How can I filter out the distinct records with number of occurances less than a pre-determined threshold?

I don't quite understand this part.

awk '{print $1}' filename | sort | uniq -c | sort -rn

Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one.

Now I think you want to put that through a loop and only show those that are less than threshold?

Thanks
Sheraz


------Original Message------
From: sheraznaz@yahoo.com
Sender: centos-bounces@centos.org
To: CentOS mailing list
ReplyTo: CentOS mailing list
Subject: Re: [CentOS] Text file manipulation in CentOS?
Sent: May 11, 2010 1:14 AM

Can you sample input and expected result.


Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: hadi motamedi <motamedi24@gmail.com>
Date: Tue, 11 May 2010 09:09:23
To: CentOS mailing list<centos@centos.org>
Subject: [CentOS] Text file manipulation in CentOS?

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Sent from my Verizon Wireless BlackBerry
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-11-2010, 08:51 AM
hadi motamedi
 
Default Text file manipulation in CentOS?

I don't quite understand this part.


Thank you very much for your reply.Please find below a segment* of the file:
CallId* 9* State TK******** Bts 7* Bt 2* Tr (13 0x09)* E1 (4 1 5)* Tru (0 3 0)
CallId* 9* State TK******** Bts 7* Bt 2* Tr (13 0x09)* E1 (4 1 5)* Tru (0 3 0)

CallId* 9* State TK******** Bts 7* Bt 2* Tr (13 0x09)* E1 (4 1 5)* Tru (0 3 0)
CallId* 9* State TK******** Bts 7* Bt 2* Tr (13 0x09)* E1 (4 1 5)* Tru (0 3 0)
CallId* 9* State TK******** Bts 7* Bt 2* Tr (13 0x09)* E1 (4 1 5)* Tru (0 3 0)

CallId* 9* State TK******** Bts 7* Bt 2* Tr (13 0x09)* E1 (4 1 5)* Tru (0 3 0)
CallId* 9* State TK******** Bts 7* Bt 2* Tr (13 0x09)* E1 (4 1 5)* Tru (0 3 0)
CallId 94* State TK******** Bts 7* Bt 1* Tr (8 0x0c)* E1 (7 0 15)* Tru (0 0 2)

CallId 94* State TK******** Bts 7* Bt 1* Tr (8 0x0c)* E1 (7 0 15)* Tru (0 0 2)
CallId 94* State TK******** Bts 7* Bt 1* Tr (8 0x0c)* E1 (7 0 15)* Tru (0 0 2)
CallId 94* State TK******** Bts 7* Bt 1* Tr (8 0x0c)* E1 (7 0 15)* Tru (0 0 2)

CallId 94* State TK******** Bts 7* Bt 1* Tr (8 0x0c)* E1 (7 0 15)* Tru (0 0 2)
CallId 94* State TK******** Bts 7* Bt 1* Tr (6 0x0f)* E1 (7 0 15)* Tru (0 0 2)
CallId 94* State TK******** Bts 7* Bt 1* Tr (6 0x0f)* E1 (7 0 15)* Tru (0 0 2)

CallId 92* State TK******** Bts 7* Bt 1* Tr (7 0x08)* E1 (3 1 22)* Tru (0 0 0)
CallId 92* State TK******** Bts 7* Bt 1* Tr (7 0x08)* E1 (3 1 22)* Tru (0 0 0)
CallId 92* State TK******** Bts 7* Bt 1* Tr (7 0x08)* E1 (3 1 22)* Tru (0 0 0)

CallId 92* State TK******** Bts 7* Bt 1* Tr (7 0x08)* E1 (3 1 22)* Tru (0 0 0)
CallId 92* State IH******** Bts 7* Bt 1* Tr (6 0x0a)* E1 (3 1 22)* Tru (0 0 0)
CallId 92* State IH******** Bts 7* Bt 1* Tr (7 0x08)* E1 (3 1 22)* Tru (0 0 0)

CallId 92* State CL******** Bts 7* Bt 1* Tr (6 0x0a)* E1 (3 1 22)* Tru (0 0 0)
CallId 91* State TK******** Bts 5* Bt 1* Tr (4 0x0f)* E1 (4 0 18)* Tru (0 1 1)
CallId 91* State TK******** Bts 5* Bt 1* Tr (4 0x0f)* E1 (4 0 18)* Tru (0 1 1)

CallId 91* State TK******** Bts 5* Bt 1* Tr (4 0x0f)* E1 (4 0 18)* Tru (0 1 1)
CallId 91* State TK******** Bts 5* Bt 1* Tr (4 0x0f)* E1 (4 0 18)* Tru (0 1 1)
Your first comment on using 'awk' enabled me to find how many distinct 'CallId' exists in my log. For the second part, please imagine that I need to filter out that 'CallId' that have occured for say less than three times.Please help me on accomplishing the second part.

Thank you


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-11-2010, 10:55 AM
Eduardo Grosclaude
 
Default Text file manipulation in CentOS?

On Tue, May 11, 2010 at 5:51 AM, hadi motamedi <motamedi24@gmail.com> wrote:
>
>
>> I don't quite understand this part.
>>
> Thank you very much for your reply.Please find below a segment* of the file:

If you give the following command:

sort YOUR_FILE | uniq -c | sort -n | perl -ne 'print unless /(d+)/ and $1 < 3'

where YOUR_FILE's contents are exactly the lines you pasted earler you
will get:


3 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0
18) Tru (0 1 1)
4 CallId 92 State TK Bts 7 Bt 1 Tr (7 0x08) E1 (3 1
22) Tru (0 0 0)
5 CallId 94 State TK Bts 7 Bt 1 Tr (8 0x0c) E1 (7 0
15) Tru (0 0 2)
7 CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1
5) Tru (0 3 0)

The first number is the number of occurrences of each CallId
Does this help?

--
Eduardo Grosclaude
Universidad Nacional del Comahue
Neuquen, Argentina
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-11-2010, 11:12 AM
hadi motamedi
 
Default Text file manipulation in CentOS?

Does this help?

The first number is the number of occurrences of each CallId
Thank you for your help. It is very important for me to how the number of occurances of each CallId# . But can you please let me know why the number obtained from your code does not match with manual counting on say one of the CallId#? Can you please correct me?



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-11-2010, 11:46 AM
Eduardo Grosclaude
 
Default Text file manipulation in CentOS?

On Tue, May 11, 2010 at 8:12 AM, hadi motamedi <motamedi24@gmail.com> wrote:
>
>
>> Does this help?
>> The first number is the number of occurrences of each CallId
>
> Thank you for your help. It is very important for me to how the number of
> occurances of each CallId# . But can you please let me know why the number
> obtained from your code does not match with manual counting on say one of
> the CallId#? Can you please correct me?

Oh, that's because uniq thinks that two lines are different if your
characters TK,CL... and the rest of the line are different. If you
want to count lines only by the number following CallId you should
tell uniq to compare only the first characters in the line:

$ cat hadi | sort | uniq -c -w 9 | sort -n | perl -ne 'print unless
/(d+)/ and $1 < 3'
4 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0
18) Tru (0 1 1)
7 CallId 92 State CL Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1
22) Tru (0 0 0)
7 CallId 94 State TK Bts 7 Bt 1 Tr (6 0x0f) E1 (7 0
15) Tru (0 0 2)
7 CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1
5) Tru (0 3 0)

(note -w 9).

--
Eduardo Grosclaude
Universidad Nacional del Comahue
Neuquen, Argentina
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-11-2010, 05:29 PM
Dominik Zyla
 
Default Text file manipulation in CentOS?

On Tue, May 11, 2010 at 08:25:43AM +0000, sheraznaz@yahoo.com wrote:
> >>To be more specific, I need to find how many distinct records are there in say column#1?
>
> awk '{print $1}' filename | sort -u | wc -l
>
> This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter)
>
> >> How can I filter out the distinct records with number of occurances less than a pre-determined threshold?
>
> I don't quite understand this part.
>
> awk '{print $1}' filename | sort | uniq -c | sort -rn
>
> Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one.
>
> Now I think you want to put that through a loop and only show those that are less than threshold?

If I understand correctly, you can pipe your output to: `awk '{a=$1} {if
(a > 3) print a}'. `a' is awk variable. `$1' is first column of awk
input so you probably need to change it.

--
Dominik Zyla

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-12-2010, 04:12 AM
hadi motamedi
 
Default Text file manipulation in CentOS?

$ cat hadi | sort | uniq -c -w 9 | sort -n | perl -ne 'print unless

/(d+)/ and $1 < 3'

* * *4 CallId 91 *State TK * * * * Bts 5 *Bt 1 *Tr (4 0x0f) *E1 (4 0

18) *Tru (0 1 1)

* * *7 CallId 92 *State CL * * * * Bts 7 *Bt 1 *Tr (6 0x0a) *E1 (3 1
Thank you for your reply. To just have one 'State' for the CallId , I created one new logfile as the following:

#more logfile1 | grep "State TK" >> logfile2
Then in the logfile2 , I tried to count the number of occurances of each distinct CallId with the aid of your proposed command . But in the output, I see differences between the number obtained from counting them manually with the one generated from your command. Can you please correct me?



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 05-12-2010, 04:20 AM
hadi motamedi
 
Default Text file manipulation in CentOS?

If I understand correctly, you can pipe your output to: `awk '{a=$1} {if


(a > 3) * print a}'. `a' is awk variable. `$1' is first column of awk

input so you probably need to change it.




Thank you for your message . Yes , you are right . I really need to filter out that CallId with number of occurances say less than three. But your command is not getting through on my centos . Please correct me.





_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 07:07 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org