FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 11-09-2010, 12:10 PM
~Stack~
 
Default Off topic question about grep

Hello everyone!

I ran into a strange issue with grep and I was hoping someone could
explain what I feel is an oddity.

I was trying to match a word that starts with either a _ or a letter
followed by any number of _, letters, or numbers. (eg: Good = Asdf1,
_aSD1. Bad: 9_asD ). My test text file is just those three examples,
each on a new line.

I first tested with this:
[_a-zA-Z][_a-zA-Z0-9]

But that would match against 9_asD which begins with a number (not what
I wanted). So I tried:
[_a-zA-Z][_a-zA-Z0-9]*

I realize that the expression won't do what I mistakenly thought I
wanted it to do. What is puzzling to me is that my hard disk usage
peaked, my cpu jumped, and grep took almost two minutes to return an
exit code of 1 (no match). :-/

At first I thought it may be an issue with Debian Squeeze (current box)
so I tried it on Debian Lenny with similar results. Same for an Ubuntu
Lucid and Fedora 10. So I am pretty sure it is something with grep and
not just the version of grep.

I was hoping someone might know why grep behaves so oddly with that
expression. If it was a monster file or something I could understand
the system utilization peak, but it is just three lines in a text file.

Just so you know, I have a working solution. In my case, every instance
is on a new line so I have a working expression using:
^[_a-zA-Z][_a-zA-Z0-9]*$

I am just curious about the odd behavior.

Thanks!
 
Old 11-09-2010, 01:00 PM
Jochen Schulz
 
Default Off topic question about grep

~Stack~:
>
> But that would match against 9_asD which begins with a number (not what
> I wanted). So I tried:
> [_a-zA-Z][_a-zA-Z0-9]*
>
> I realize that the expression won't do what I mistakenly thought I
> wanted it to do. What is puzzling to me is that my hard disk usage
> peaked, my cpu jumped, and grep took almost two minutes to return an
> exit code of 1 (no match). :-/

What was your exact command line? Did you quote the regular expression?
My guess is that the shell interpreted the '*' character for you and you
ended up with a command line like this:

$ grep [_a-zA-Z][_a-zA-Z0-9]file1 file2 file3

where file1 etc. are the files in your current directory. That's why
grep took so long to finish and it didn't find anything because file1 is
part of your regexp.

J.
--
Scientists know what they are talking about.
[Agree] [Disagree]
<http://www.slowlydownward.com/NODATA/data_enter2.html>
 
Old 11-09-2010, 03:24 PM
Paul E Condon
 
Default Off topic question about grep

On 20101109_071001, ~Stack~ wrote:
> Hello everyone!
>
> I ran into a strange issue with grep and I was hoping someone could
> explain what I feel is an oddity.
>
> I was trying to match a word that starts with either a _ or a letter
> followed by any number of _, letters, or numbers. (eg: Good = Asdf1,
> _aSD1. Bad: 9_asD ). My test text file is just those three examples,
> each on a new line.
>
> I first tested with this:
> [_a-zA-Z][_a-zA-Z0-9]
>
> But that would match against 9_asD which begins with a number (not what
> I wanted). So I tried:
> [_a-zA-Z][_a-zA-Z0-9]*
>
> I realize that the expression won't do what I mistakenly thought I
> wanted it to do. What is puzzling to me is that my hard disk usage
> peaked, my cpu jumped, and grep took almost two minutes to return an
> exit code of 1 (no match). :-/
>
> At first I thought it may be an issue with Debian Squeeze (current box)
> so I tried it on Debian Lenny with similar results. Same for an Ubuntu
> Lucid and Fedora 10. So I am pretty sure it is something with grep and
> not just the version of grep.
>
> I was hoping someone might know why grep behaves so oddly with that
> expression. If it was a monster file or something I could understand
> the system utilization peak, but it is just three lines in a text file.
>
> Just so you know, I have a working solution. In my case, every instance
> is on a new line so I have a working expression using:
> ^[_a-zA-Z][_a-zA-Z0-9]*$

This last expression anchors the expression to the beginning of a line.
To anchor an expression to the beginning of a word you need:

<[_a-zA-Z][_a-zA-Z0-9]*$

but this will only work if you agree with the implementers of grep as to what
it is that defines the beginning of a word. What is your definition?

Look in 'man grep' for clues as to where you can find the official
grep implmenters definition. I found '<' in 'man grep' under
'The Backslash Character and Special Expressions'

HTH
--
Paul E Condon
pecondon@mesanetworks.net


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20101109162431.GA3186@big.lan.gnu">http://lists.debian.org/20101109162431.GA3186@big.lan.gnu
 
Old 11-09-2010, 05:26 PM
Bob McGowan
 
Default Off topic question about grep

On 11/09/2010 06:00 AM, Jochen Schulz wrote:
> ~Stack~:
>>
>> But that would match against 9_asD which begins with a number (not what
>> I wanted). So I tried:
>> [_a-zA-Z][_a-zA-Z0-9]*
>>
>> I realize that the expression won't do what I mistakenly thought I
>> wanted it to do. What is puzzling to me is that my hard disk usage
>> peaked, my cpu jumped, and grep took almost two minutes to return an
>> exit code of 1 (no match). :-/
>
> What was your exact command line? Did you quote the regular expression?
> My guess is that the shell interpreted the '*' character for you and you
> ended up with a command line like this:
>
> $ grep [_a-zA-Z][_a-zA-Z0-9]file1 file2 file3
>
> where file1 etc. are the files in your current directory. That's why
> grep took so long to finish and it didn't find anything because file1 is
> part of your regexp.
>
> J.

To be pedantically correct

grep [_a-zA-Z][_a-zA-Z0-9]*

The shell will expand the above into space separated values, based on
matches to the glob pattern. The first match will become the pattern
used by grep, searched for in the remaining file names. Try this:

echo grep [_a-zA-Z][_a-zA-Z0-9]*

to see what the shell does in any particular case. For example, I got:

grep 00firefox-files_before 01cache.list ... xxyy

The ... in the above is 57 files. 58 files counting the xxyy were
searched for the "pattern" 00firefox-files_before, which is actually a
file name and so not likely to be found in any of the files searched.
If you want to prove this, try:

ls > files
grep *
files:00firefox-files_before

--
Bob McGowan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4CD9926B.8080508@symantec.com">http://lists.debian.org/4CD9926B.8080508@symantec.com
 
Old 11-09-2010, 10:04 PM
~Stack~
 
Default Off topic question about grep

On 11/09/2010 12:26 PM, Bob McGowan wrote:
> On 11/09/2010 06:00 AM, Jochen Schulz wrote:
...
>> What was your exact command line? Did you quote the regular expression?
>> My guess is that the shell interpreted the '*' character for you and you
>> ended up with a command line like this:
>>
>> $ grep [_a-zA-Z][_a-zA-Z0-9]file1 file2 file3
...

> The shell will expand the above into space separated values, based on
> matches to the glob pattern. The first match will become the pattern
> used by grep, searched for in the remaining file names. Try this:
>
> echo grep [_a-zA-Z][_a-zA-Z0-9]*
>
> to see what the shell does in any particular case.

Yeah. I feel really silly now.

I was so focused on getting the regular expression right that I
completely forgot to consider the shell interpreting things on my
behalf. Couldn't see the forest because of the tree in my way I guess.

Thanks! I do appreciate it.
 

Thread Tools




All times are GMT. The time now is 04:20 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org