FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 08-02-2010, 06:56 AM
Zhang Weiwu
 
Default match across line using grep

I'm grepping a bunch of files each have a segment code that executes a
SQL.

My problem is that the query spans across several lines and I can't

seem to make grep honor (?s) for that. Here's an example:



grep --E 'select.*from.*;' .



so that matches the following fine:



select * from mytable where id=1;




however, it does not match the following:



select * from mytable where id=1

and name='foo'";


I tried to use -z parameter for grep, which the manual says would make
grep not treating
as line terminator. But it doesn't work neither. A
simple test shows I might have misunderstood the use of -z:



$ printf 'a
b' | grep -zo a.*b



(The above should output something /if/ -z would make egrep not
consider
as string terminator. But it has produced no output)
 
Old 08-02-2010, 11:27 AM
Camalen
 
Default match across line using grep

On Mon, 02 Aug 2010 14:56:45 +0800, Zhang Weiwu wrote:

> I'm grepping a bunch of files each have a segment code that executes a
> SQL. My problem is that the query spans across several lines and I can't
> seem to make grep honor (?s) for that.

(...)

Google says there is a package named "pcregrep" that it may help with
this :-?

Curiously, "man grep" tell about "-P" swicth but it seems to be disable
for Debian package build:

***
grep: Support for the -P option is not compiled into this --disable-perl-
regexp binary
***

Greetings,

--
Camalen


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: pan.2010.08.02.11.27.12@gmail.com">http://lists.debian.org/pan.2010.08.02.11.27.12@gmail.com
 
Old 08-02-2010, 09:40 PM
 
Default match across line using grep

>> On Mon, 02 Aug 2010 14:56:45 +0800,
>> Zhang Weiwu <zhangweiwu@realss.com> said:

Z> I'm grepping a bunch of files each have a segment code that executes a
Z> SQL. My problem is that the query spans across several lines and I
Z> can't seem to make grep honor (?s) for that.

Perl Is Our Friend. Here's some text to search:

me% cat -n sample
1 Message-ID: <4C566C2D.7000206@realss.com>
2 Date: Mon, 02 Aug 2010 14:56:45 +0800
3 From: Zhang Weiwu <zhangweiwu@realss.com>
4 Organization: Real Softservice
5 Status: RO
6 Content-Length: 2410
7 Lines: 80
8
9 I'm grepping a bunch of files each have a segment code
10 that executes a SQL. Both selects should match:
11
12 select * from table1 where id=1;
13
14 select * from table2 where id=1
15 and name='foo'";
16
17 I tried to use -z parameter for grep, which the manual says
18 would make grep not treating
as line terminator. But
19 it doesn't work neither. A simple test shows I might have
20 misunderstood the use of -z:

The script below my signature gives these results:

me% ./pgrep 'select.*from.*;' sample
[sample:12] select * from table1 where id=1;
matched >>select * from table1 where id=1;<<

[sample:15] select * from table2 where id=1
and name='foo'";
matched >>select * from table2 where id=1
and name='foo'";<<

"[sample:15]" means the match happened at or before line 15 in file
"sample". You can pass multiple files on the command line. The
"matched" stuff is there in case there's some distracting text on the
line besides the select statement. If you want matching to be
case-sensitive, change "si:" to "s:" on the line where $pattern is set.

--
Karl Vogel I don't speak for the USAF or my company

---------------------------------------------------------------------------
#!/usr/bin/perl -w
# Taken from perl-grep3.pl in "Mastering Perl"

use strict;

# Get the desired pattern, make newlines match '.' and ignore case.
my $pattern = shift @ARGV || die "I need a pattern
";
$pattern = '(?si:' . $pattern . ')';

# Make sure pattern works.
my $regex = eval { qr/$pattern/ };
die "Check your pattern! $@" if $@;

# Use paragraph mode to handle newlines.
$/ = "";
my $line = 0;

while (<>) {
$line += tr/
/
/;
chomp;
print "[$ARGV:", $line-1, "] $_
matched >>$&<<

" if m/$regex/;
$line = 0 if eof(ARGV); # reset counter for a new file.
}

exit(0);


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100802214030.A21B6BF2A@kev.msw.wpafb.af.mil">htt p://lists.debian.org/20100802214030.A21B6BF2A@kev.msw.wpafb.af.mil
 
Old 08-03-2010, 09:53 AM
Andre Majorel
 
Default match across line using grep

On 2010-08-02 14:56 +0800, Zhang Weiwu wrote:

> I'm grepping a bunch of files each have a segment code that
> executes a SQL. My problem is that the query spans across
> several lines and I can't seem to make grep honor (?s) for
> that. Here's an example:
>
> grep --E 'select.*from.*;' .

"--E" ? Did you mean "-E" ?

> so that matches the following fine:
>
> select * from mytable where id=1;
>
>
> however, it does not match the following:
>
> select * from mytable where id=1
> and name='foo'";

So your search unit is one SQL statement. You need something
that knows SQL syntax and can extract SQL statements from your
file and present them to grep, each on its own line.

If all the semicolons in your SQL code terminate a statement
(E.G. no semicolons in string constants), you might be able to
get away with

tr '
;' '
'

> I tried to use -z parameter for grep, which the manual says
> would make grep not treating
as line terminator. But it
> doesn't work neither. A simple test shows I might have
> misunderstood the use of -z:
>
> $ printf 'a
b' | grep -zo a.*b
>
> (The above should output something /if/ -z would make egrep
> not consider
as string terminator. But it has produced no
> output)

But grep -z does. This would seem to be an undocumented
limitation of -o.

--
Andr Majorel <http://www.teaser.fr/~amajorel/>
"Of course the Debian project would never publish my email address !
Do you think they're stupid ? Spammers would harvest it."


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100803095328.GA19284@aym.net2.nerim.net">http://lists.debian.org/20100803095328.GA19284@aym.net2.nerim.net
 
Old 08-03-2010, 11:37 AM
Zhang Weiwu
 
Default match across line using grep

On 2010年08月03日 17:53, Andre Majorel wrote:


> $ printf 'a
b' | grep -zo a.*b
>
> (The above should output something /if/ -z would make egrep
> not consider
as string terminator. But it has produced no
> output)


But grep -z does. This would seem to be an undocumented
limitation of -o.




No it doesn't.



$ printf 'a
b' | grep -z 'a.*b'

$
 
Old 08-03-2010, 12:39 PM
Andre Majorel
 
Default match across line using grep

On 2010-08-03 19:37 +0800, Zhang Weiwu wrote:
> On 2010???08???03??? 17:53, Andre Majorel wrote:
> >> > $ printf 'a
b' | grep -zo a.*b
> >> >
> >> > (The above should output something /if/ -z would make egrep
> >> > not consider
as string terminator. But it has produced no
> >> > output)
> >>
> > But grep -z does. This would seem to be an undocumented
> > limitation of -o.
> >
>
> No it doesn't.
>
> $ printf 'a
b' | grep -z 'a.*b'
> $

You're welcome. What version of grep ?

--
Andr Majorel <http://www.teaser.fr/~amajorel/>
If the Debian project published their users' email addresses,
we'd be getting spam. So I'm glad they don't.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100803123949.GA4007@aym.net2.nerim.net">http://lists.debian.org/20100803123949.GA4007@aym.net2.nerim.net
 
Old 08-03-2010, 04:57 PM
Bob McGowan
 
Default match across line using grep

On 08/03/2010 05:39 AM, Andre Majorel wrote:
> On 2010-08-03 19:37 +0800, Zhang Weiwu wrote:
>> On 2010???08???03??? 17:53, Andre Majorel wrote:
>>>>> $ printf 'a
b' | grep -zo a.*b
>>>>>
>>>>> (The above should output something /if/ -z would make egrep
>>>>> not consider
as string terminator. But it has produced no
>>>>> output)
>>>>
>>> But grep -z does. This would seem to be an undocumented
>>> limitation of -o.
>>>
>>
>> No it doesn't.
>>
>> $ printf 'a
b' | grep -z 'a.*b'
>> $
>
> You're welcome. What version of grep ?
>

The -z "sort of" does/doesn't work for me. If I do this:

$ perl -e 'print "a
b"'| grep -z 'a.*b'
$

There's no output. But change it like this:

$ perl -e 'print "a
b"'| grep -z 'a'
a
b$

It found, and printed, the newline containing string. I would suspect
the regex engine is still honoring '. (dot) does not match newline'
convention but is OK with literals, if present.

If, instead of using the '.*' pattern, I embed a literal newline, it
also works:

$ perl -e 'print "a
b"'| grep -z 'a
> b'
a
b$

And just to prove the point, it does work with multiple null terminated
lines:

perl -e 'print "a
bnot here"'| grep -z 'a
> b'
a
b$

I'm using GNU grep 2.5.3

--
Bob McGowan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4C584A92.70102@symantec.com">http://lists.debian.org/4C584A92.70102@symantec.com
 
Old 08-03-2010, 06:28 PM
Andre Majorel
 
Default match across line using grep

On 2010-08-03 09:57 -0700, Bob McGowan wrote:
> On 08/03/2010 05:39 AM, Andre Majorel wrote:
> > On 2010-08-03 19:37 +0800, Zhang Weiwu wrote:
> >> On 2010???08???03??? 17:53, Andre Majorel wrote:
> >>>>> $ printf 'a
b' | grep -zo a.*b
> >>>>>
> >>>>> (The above should output something /if/ -z would make egrep
> >>>>> not consider
as string terminator. But it has produced no
> >>>>> output)
> >>>>
> >>> But grep -z does. This would seem to be an undocumented
> >>> limitation of -o.
> >>
> >> No it doesn't.
> >>
> >> $ printf 'a
b' | grep -z 'a.*b'
> >> $
> >
> > You're welcome. What version of grep ?
>
> The -z "sort of" does/doesn't work for me. If I do this:
>
> $ perl -e 'print "a
b"'| grep -z 'a.*b'
> $

$ printf 'a
b'| grep -z 'a.*b'
a
b$ grep --version
GNU grep 2.5.3

Fun, eh ? Maybe the answer is in there :

$ locale
LANG=
LC_CTYPE=en_US
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE=C
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

> There's no output. But change it like this:
>
> $ perl -e 'print "a
b"'| grep -z 'a'
> a
> b$
>
> It found, and printed, the newline containing string. I would suspect
> the regex engine is still honoring '. (dot) does not match newline'
> convention but is OK with literals, if present.

My grep -z acts like it used a regexp engine where "." matches
newline. Only when -o is in effect and there is a newline in the
match, there's no output. But the exit status is still good :

$ printf 'a
b'| (grep -z 'a.*b' && printf 'st=%d chars=' $? >&2) | wc -c
st=0 chars=4
$ printf 'a
b'| (grep -oz 'a.*b' && printf 'st=%d chars=' $? >&2) | wc -c
st=0 chars=0

--
Andr Majorel <http://www.teaser.fr/~amajorel/>
No one ever sends you any email ? Report a bug in Debian !


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100803182837.GC4007@aym.net2.nerim.net">http://lists.debian.org/20100803182837.GC4007@aym.net2.nerim.net
 
Old 08-03-2010, 08:55 PM
Bob McGowan
 
Default match across line using grep

On 08/03/2010 11:28 AM, Andre Majorel wrote:
> On 2010-08-03 09:57 -0700, Bob McGowan wrote:
>> On 08/03/2010 05:39 AM, Andre Majorel wrote:
>>> On 2010-08-03 19:37 +0800, Zhang Weiwu wrote:
>>>> On 2010???08???03??? 17:53, Andre Majorel wrote:
>>>>>>> $ printf 'a
b' | grep -zo a.*b
>>>>>>>

<--deleted-->

> Fun, eh ? Maybe the answer is in there :
>
> $ locale
> LANG=
> LC_CTYPE=en_US
> LC_NUMERIC="POSIX"
> LC_TIME="POSIX"
> LC_COLLATE=C
> LC_MONETARY="POSIX"
> LC_MESSAGES="POSIX"
> LC_PAPER="POSIX"
> LC_NAME="POSIX"
> LC_ADDRESS="POSIX"
> LC_TELEPHONE="POSIX"
> LC_MEASUREMENT="POSIX"
> LC_IDENTIFICATION="POSIX"
> LC_ALL=

This does appear to be the "issue". My settings are:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

>
>> There's no output. But change it like this:
>>
>> $ perl -e 'print "a
b"'| grep -z 'a'
>> a
>> b$
>>
>> It found, and printed, the newline containing string. I would suspect
>> the regex engine is still honoring '. (dot) does not match newline'
>> convention but is OK with literals, if present.
>

I did a sub-shell and reset all the variables to match yours, and,
bingo, the wildcard worked.

Looking through the list of names, nothing seems 'obvious' as a single
contributor. In fact, the LC_ names all seem to be specific to things
that would not necessarily impact the regex operation.

So, I picked LANG as a starting point and reset it, *only*, to empty.
And got lucky. That is, apparently, the variable that affects how the
regex is handled.

--
Bob McGowan
Symantec
US Internationalization


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4C588249.8010604@symantec.com">http://lists.debian.org/4C588249.8010604@symantec.com
 
Old 08-06-2010, 01:49 AM
Zhang Weiwu
 
Default match across line using grep

On 2010年08月04日 04:55, Bob McGowan wrote:
> In fact, the LC_ names all seem to be specific to things
> that would not necessarily impact the regex operation.
>
It is not totally true. The encoding part might. If it is UTF-8, in
theory, [:digit:] should match more than 0-9. It might, for example,
mache 一-十 (Chinese digits).


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4C5B6A10.3070702@realss.com">http://lists.debian.org/4C5B6A10.3070702@realss.com
 

Thread Tools




All times are GMT. The time now is 12:15 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org