FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 05-16-2011, 11:10 PM
Felix Miata
 
Default is a nice "place" :-D

After attempting to install for the first time last week, I started 3
different threads here looking for help. I'm pleased with the nature of the
responses, and being able to succeed eventually using a mix of those
responses and my own efforts digging into Google, gentoo.org and cranial
cobwebs. So, thanks to all who replied, and even to those who showed interest
without replying.


For http://fm.no-ip.com/Tmp/Linux/G/, newly created to use with those three
threads, 'cat /var/log/apache2/access_log | grep "GET /Tmp/Linux/G" | grep -v
<myip> | sort > outfile' generated 117 lines. That's a lot more hits than I
can ever remember getting before when asking for help from a mailing list
(even if it did take 5 days to accumulate so many).


I'm curious if anyone here would like to offer a better variant of my local
query that would limit the hit count so that no more than one hit per IP is
represented in the output? My skill with such things is very limited. I can't
think of the the name of a command to cut the IP off the front of each line,
much less how to compare if it's a non-first instance to be discarded. Or,
maybe there's an Apache utility for doing this that I just don't know about?

--
"The wise are known for their understanding, and pleasant
words are persuasive." Proverbs 16:21 (New Living Translation)

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata *** http://fm.no-ip.com/
 
Old 05-16-2011, 11:33 PM
Alan McKinnon
 
Default is a nice "place" :-D

Apparently, though unproven, at 01:10 on Tuesday 17 May 2011, Felix Miata did
opine thusly:

> After attempting to install for the first time last week, I started 3
> different threads here looking for help. I'm pleased with the nature of the
> responses, and being able to succeed eventually using a mix of those
> responses and my own efforts digging into Google, gentoo.org and cranial
> cobwebs. So, thanks to all who replied, and even to those who showed
> interest without replying.
>
> For http://fm.no-ip.com/Tmp/Linux/G/, newly created to use with those three
> threads, 'cat /var/log/apache2/access_log | grep "GET /Tmp/Linux/G" | grep
> -v <myip> | sort > outfile' generated 117 lines. That's a lot more hits
> than I can ever remember getting before when asking for help from a
> mailing list (even if it did take 5 days to accumulate so many).
>
> I'm curious if anyone here would like to offer a better variant of my local
> query that would limit the hit count so that no more than one hit per IP is
> represented in the output? My skill with such things is very limited. I
> can't think of the the name of a command to cut the IP off the front of
> each line, much less how to compare if it's a non-first instance to be
> discarded. Or, maybe there's an Apache utility for doing this that I just
> don't know about?

There's always a million ways to skin a cat like this. At a high volume site
you would of course not try and deal with this directly from the apache logs.
You would send them to syslog which would parse them and write them to a
database from where you could run sophisticated SQL.

There are also Apache analyser apps out there, google will find them.

But I think all that is overkill for what you want. Your command works fine
except for needing to discard duplicate IPs. You don't seem to need to know
the details of the GET, so just grab using awk the first field and sort | uniq
the result. It will run a tad quicker (and reveal less n00bness to your
audience) if you grep the file directly instead of cat | grep:

grep "GET /Tmp/Linux/G" | /var/log/apache2/access_log | grep-v <myip> |
awk '{print $1}' | sort | uniq | wc

In true grand Unix tradition you cannot get quicker, dirtier or more effective
than that


--
alan dot mckinnon at gmail dot com
 
Old 05-17-2011, 12:36 AM
Willie Wong
 
Default is a nice "place" :-D

On Tue, May 17, 2011 at 01:33:39AM +0200, Alan McKinnon wrote:
> grep "GET /Tmp/Linux/G" | /var/log/apache2/access_log | grep-v <myip> |
> awk '{print $1}' | sort | uniq | wc
>
> In true grand Unix tradition you cannot get quicker, dirtier or more effective
> than that
>

You can replace "sort | uniq" by "sort -u"

And the "Grand Unix Tradition" probably would 'cut' instead of awk

While you are at it, an incantation that pipes grep to awk? Seriously?

W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
 
Old 05-17-2011, 12:38 AM
Felix Miata
 
Default is a nice "place" :-D

On 2011/05/17 01:33 (GMT+0200) Alan McKinnon composed:


grep "GET /Tmp/Linux/G" | /var/log/apache2/access_log | grep-v<myip> |
awk '{print $1}' | sort | uniq | wc



In true grand Unix tradition you cannot get quicker, dirtier or more effective
than that


It almost worked too. :-)

grep "GET /Tmp/Linux/G" /var/log/apache2/access_log | grep -v <myip> |
awk '{print $1}' | sort | uniq | wc -l

got me almost what I wanted, 20 unique IPs, but that's a lot of stuff to
remember, which for me will never happen. So I tried converting to an alias.


grep "GET $1" | /var/log/apache2/access_log | grep -v <myip> |
awk '{print $1}' | sort | uniq | wc -l

sort of works, except I won't always be looking for GET as part of what to
grep for, or might require more than one whitepsace instance, and am tripping
over how to deal with the whitespace if I leave GET out of the alias and only
put on cmdline if I actually want it as part of what to grep for.


grep "GET $1 $2" | /var/log/apache2/access_log | grep -v <myip> |
awk '{print $1}' | sort | uniq | wc -l

seems to work, but I'm not sure there aren't booby traps besides 2nd or more
whitespace instances I'm not considering, even though it gets the same answer
for this particular case.

--
"The wise are known for their understanding, and pleasant
words are persuasive." Proverbs 16:21 (New Living Translation)

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata *** http://fm.no-ip.com/
 
Old 05-17-2011, 07:25 AM
Neil Bothwick
 
Default is a nice "place" :-D

On Tue, 17 May 2011 01:33:39 +0200, Alan McKinnon wrote:

> grep "GET /Tmp/Linux/G" | /var/log/apache2/access_log | grep-v <myip> |
> awk '{print $1}' | sort | uniq | wc
>
> In true grand Unix tradition you cannot get quicker, dirtier or more
> effective than that
>

awk does pattern matching, o you can ditch the grep stage and use

awk '! /myip/ {print $1}'

You could use awk to search for the GET patterns too, not only saving yet
another process, but making sure that no one else, including you next
month, can work out what the command is supposed to do.

sort -u would save having a separate process for uniq, but I've no idea
if it's faster. It's only worth using sort -u if you would use uniq with
no arguments.


--
Neil Bothwick

- We are but packets in the internet of Life-
 
Old 05-17-2011, 10:43 AM
Pandu Poluan
 
Default is a nice "place" :-D

On 2011-05-17, Neil Bothwick <neil@digimed.co.uk> wrote:
> On Tue, 17 May 2011 01:33:39 +0200, Alan McKinnon wrote:
>
>> grep "GET /Tmp/Linux/G" | /var/log/apache2/access_log | grep-v <myip> |
>> awk '{print $1}' | sort | uniq | wc
>>
>> In true grand Unix tradition you cannot get quicker, dirtier or more
>> effective than that
>>
>
> awk does pattern matching, o you can ditch the grep stage and use
>
> awk '! /myip/ {print $1}'
>
> You could use awk to search for the GET patterns too, not only saving yet
> another process, but making sure that no one else, including you next
> month, can work out what the command is supposed to do.
>

Meh, me forgetting what an awk snippet do? Never!

sed ... now that's a wholly different story :-P

> sort -u would save having a separate process for uniq, but I've no idea
> if it's faster. It's only worth using sort -u if you would use uniq with
> no arguments.
>

And you can actually do the 'uniq' or '-u' function within awk. Quite
easily, in fact.

Here's a sample of awk doing uniq:

awk '!x[$1]++ { print $1 }'

Benefit? It doesn't care if the non-unique lines are one-after-another
or spread all over the text. The above snippet prints only the first
occurence. Combine that with a test for match:

awk '!x[$1]++ && $0 ~ /awesome_regex_pattern/ {print $1}'

then with a test for negated match

awk '!x[$1]++ && $0 ~ /awesome_regex_pattern/ && $0 !~
/more_awesome_regex/ {print $1}'

Rgds,
--
Pandu E Poluan - IT Optimizer
My website: http://pandu.poluan.info/
 
Old 05-17-2011, 01:10 PM
Juan Diego Tascón
 
Default is a nice "place" :-D

On Tue, May 17, 2011 at 5:43 AM, Pandu Poluan <pandu@poluan.info> wrote:
> On 2011-05-17, Neil Bothwick <neil@digimed.co.uk> wrote:
>> On Tue, 17 May 2011 01:33:39 +0200, Alan McKinnon wrote:
>>
>>> grep "GET /Tmp/Linux/G" | /var/log/apache2/access_log | grep-v <myip> |
>>> awk '{print $1}' | sort | uniq | wc
>>>
>>> In true grand Unix tradition you cannot get quicker, dirtier or more
>>> effective than that
>>>
>>
>> awk does pattern matching, o you can ditch the grep stage and use
>>
>> *awk '! /myip/ {print $1}'
>>
>> You could use awk to search for the GET patterns too, not only saving yet
>> another process, but making sure that no one else, including you next
>> month, can work out what the command is supposed to do.
>>
>
> Meh, me forgetting what an awk snippet do? Never!
>
> sed ... now that's a wholly different story :-P
>
>> sort -u would save having a separate process for uniq, but I've no idea
>> if it's faster. It's only worth using sort -u if you would use uniq with
>> no arguments.
>>
>
> And you can actually do the 'uniq' or '-u' function within awk. Quite
> easily, in fact.
>
> Here's a sample of awk doing uniq:
>
> awk '!x[$1]++ { print $1 }'
>
> Benefit? It doesn't care if the non-unique lines are one-after-another
> or spread all over the text. The above snippet prints only the first
> occurence. Combine that with a test for match:
>
> awk '!x[$1]++ && $0 ~ /awesome_regex_pattern/ {print $1}'
>
> then with a test for negated match
>
> awk '!x[$1]++ && $0 ~ /awesome_regex_pattern/ && $0 !~
> /more_awesome_regex/ {print $1}'
>
> Rgds,
> --
> Pandu E Poluan - IT Optimizer
> My website: http://pandu.poluan.info/
>
>

I have always wondered if there is a way to do awk '{ print $1}' using
only builtin bash functions when you only have a one line string
 
Old 05-17-2011, 01:36 PM
Alex Schuster
 
Default is a nice "place" :-D

Juan Diego Tascón writes:

> I have always wondered if there is a way to do awk '{ print $1}' using
> only builtin bash functions when you only have a one line string

str="one two five"

# remove all from the first blank on, but will not work with
# other whitespace
echo ${str%% *}

or

# set $1, $2, $3, ... to words of $str
set $str
echo $1

or

# create array holding one word per element
strarr=( $str )
echo $strarr (or echo ${strarr[0]})

Wonko
 
Old 05-17-2011, 01:51 PM
Juan Diego Tascón
 
Default is a nice "place" :-D

On Tue, May 17, 2011 at 8:36 AM, Alex Schuster <wonko@wonkology.org> wrote:
> Juan Diego Tascón writes:
>
>> I have always wondered if there is a way to do awk '{ print $1}' using
>> only builtin bash functions when you only have a one line string
>
> str="one two five"
>
> # remove all from the first blank on, but will not work with
> # other whitespace
> echo ${str%% *}
>
> or
>
> # set $1, $2, $3, ... to words of $str
> set $str
> echo $1
>
> or
>
> # create array holding one word per element
> strarr=( $str )
> echo $strarr *(or echo ${strarr[0]})
>
> * * * *Wonko
>
>

thanks for the info
 
Old 05-17-2011, 02:30 PM
David Haller
 
Default is a nice "place" :-D

Hello,

On Tue, 17 May 2011, Alan McKinnon wrote:
>grep "GET /Tmp/Linux/G" | /var/log/apache2/access_log | grep-v <myip> |
>awk '{print $1}' | sort | uniq | wc

useless use of ...

awk '/GET /Tmp/Linux/G/{ips[$1]++;}END{print length(ips);}'
/var/log/apache2/access_log

I add each access to ips[<IP>] in case you'd want to print that to,
e.g. by using

END {
for( i in ips ) {
print i ":" ips[i] " accesses";
}
print length(ips) " unique IPs total";
}

as the "END" block.

HTH,
-dnh

--
Any research done on how to efficiently use computers has been long lost
in the mad rush to upgrade systems to do things that aren't needed by
people who don't understand what they are really supposed to do with
them. -- Graham Reed, in asr
 

Thread Tools




All times are GMT. The time now is 06:59 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org