FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu User

 
 
LinkBack Thread Tools
 
Old 09-23-2008, 01:29 AM
Ubence Quevedo
 
Default Text Manipulation/Replacement

On Sep 22, 2008, at 04:25 PM, NoOp wrote:

> On 09/22/2008 03:53 PM, Ubence Quevedo wrote:
>>
>> ----- Original Message ----
>>> From: Chris Mohler <cr33dog@gmail.com>
>>> To: "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com
>>> >
>>> Sent: Monday, September 22, 2008 3:22:43 PM
>>> Subject: Re: Text Manipulation/Replacement
>>>
>>> On Mon, Sep 22, 2008 at 4:57 PM, Ubence Quevedo wrote:
>>>> Hello All,
>>>>
>>>> I've used pdftotext to convert a pdf document to text and then
>>>> used a
>>>> combination of grep and awk to single out data and replace
>>>> formatting
>>>> that I didn't need.
>>>>
>>>> The output data eventually looks like this:
>>>> 12,123456789
>>>> ,0987654321
>>>>
>>>> But I want it to look like this:
>>>> 12,123456789,0987654321
>>>>
>>>> I've tried many different things with awk, but I can't get it
>>>> replace
, with
>>> just a ,
>>>
>>> Hmm - I've always had headaches dealing with newlines in sed and awk
>>> (to a lesser extent - I'm more familiar with sed).
>>>
>>> How about perl?
>>>
>>> cat foo.txt | perl -pi -e 's/
//g'
>>>
>
>>
>> Hi Chris,
>>
>> This worked...kinda...but it ate all of the new lines, so I have
>> one continuous line. I need to find all instances of "
," and
>> replace them with ",". That way it is very specific in what is
>> found and replaced. I have very little perl knowledge, and my
>> feeble attempt at modifying the perl command above failed miserably.
>>
>> Any other ideas?
>>
>> -Ubence
>>
>
> Perhaps a silly question... can you not open the pdf in Adobe Reader
> 8,
> then copy & paste the text to OpenOffice Writer & accomplish what
> you want?
>
>
> --
> ubuntu-users mailing list
> ubuntu-users@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users

If that were an option, then yes. However, I'd prefer to keep this to
the command line as much as possible. I could take the output file
and transfer it to my Mac and use TextWrangler to do what I want, but
I'd rather not [since anyone else that might be doing this procedure
in the future wouldn't have access to a Mac].

-Ubence

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 09-23-2008, 01:36 AM
Ubence Quevedo
 
Default Text Manipulation/Replacement

On Sep 22, 2008, at 04:34 PM, Rick Stevens wrote:


Ubence Quevedo wrote:

----- Original Message ----

From: Patrick O'Callaghan <pocallaghan@gmail.com>
To: fedora-list@redhat.com
Sent: Monday, September 22, 2008 3:03:35 PM
Subject: Re: Text Manipulation/Replacement

On Mon, 2008-09-22 at 14:57 -0700, Ubence Quevedo wrote:

Hello All,

I've used pdftotext to convert a pdf document to text and then
used a
combination of grep and awk to single out data and replace
formatting that I didn't need.

The output data eventually looks like this:
12,123456789
,0987654321

But I want it to look like this:
12,123456789,0987654321

I've tried many different things with awk, but I can't get it
replace
, with

just a ,

For one thing, end-of-line in standard Unix text files is not
(Carriage Return), it's
(Newline).

poc

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Thanks for splitting hairs. :^)
is what first came to mind.
I've got a lead from another list that I posted on how to use perl
to accomplish what I need, but it isn't specific enough to not
replace all new lines with empty space: cat foo.txt | perl -pi -e
's/
//g'

Anyone have any ideas?


Uh, how about:

cat file.txt | sed '$!N;s/
//' >newfile.txt
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer ricks@nerd.com -
- AIM/Skype: therps2 ICQ: 22643734 Yahoo: origrps2 -
- -
- Fear is finding a ".vbs" script in your Inbox -
----------------------------------------------------------------------

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Hi Rick,

I'll have to play with this some more, but this appears to have done
the trick!


Thank you so much!

-Ubence

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
 
Old 09-23-2008, 02:15 AM
NoOp
 
Default Text Manipulation/Replacement

On 09/22/2008 06:29 PM, Ubence Quevedo wrote:
> On Sep 22, 2008, at 04:25 PM, NoOp wrote:
>
>> On 09/22/2008 03:53 PM, Ubence Quevedo wrote:
>>>
>>> ----- Original Message ----
>>>> From: Chris Mohler <cr33dog@gmail.com>
>>>> To: "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com
>>>> >
>>>> Sent: Monday, September 22, 2008 3:22:43 PM
>>>> Subject: Re: Text Manipulation/Replacement
>>>>
>>>> On Mon, Sep 22, 2008 at 4:57 PM, Ubence Quevedo wrote:
>>>>> Hello All,
>>>>>
>>>>> I've used pdftotext to convert a pdf document to text and then
>>>>> used a
>>>>> combination of grep and awk to single out data and replace
>>>>> formatting
>>>>> that I didn't need.
>>>>>
>>>>> The output data eventually looks like this:
>>>>> 12,123456789
>>>>> ,0987654321
>>>>>
>>>>> But I want it to look like this:
>>>>> 12,123456789,0987654321
>>>>>
>>>>> I've tried many different things with awk, but I can't get it
>>>>> replace
, with
>>>> just a ,
>>>>
>>>> Hmm - I've always had headaches dealing with newlines in sed and awk
>>>> (to a lesser extent - I'm more familiar with sed).
>>>>
>>>> How about perl?
>>>>
>>>> cat foo.txt | perl -pi -e 's/
//g'
>>>>
>>
>>>
>>> Hi Chris,
>>>
>>> This worked...kinda...but it ate all of the new lines, so I have
>>> one continuous line. I need to find all instances of "
," and
>>> replace them with ",". That way it is very specific in what is
>>> found and replaced. I have very little perl knowledge, and my
>>> feeble attempt at modifying the perl command above failed miserably.
>>>
>>> Any other ideas?
>>>
>>> -Ubence
>>>
>>
>> Perhaps a silly question... can you not open the pdf in Adobe Reader
>> 8,
>> then copy & paste the text to OpenOffice Writer & accomplish what
>> you want?
>>

> If that were an option, then yes. However, I'd prefer to keep this to
> the command line as much as possible. I could take the output file
> and transfer it to my Mac and use TextWrangler to do what I want, but
> I'd rather not [since anyone else that might be doing this procedure
> in the future wouldn't have access to a Mac].
>
> -Ubence
>

Well... OOo can save it as a text file, doc file, csv, xls, odt, ods,
etc + OOo can run on your Mac. You can also copy & past to the standard
text editor (gedit) etc. So while running command line might be
desirable, copy and paste from Adobe Reader, or Evince, xPDF, just might
be easier; unless of course you are doing the conversion from within a
script, multiple files, or other.

That said... this might be of interest:
http://furtivepenguin.net/index.php?s=pdftotext
http://www.pdfhacks.com/pdftk/



--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 09-23-2008, 02:22 AM
"Brian McKee"
 
Default Text Manipulation/Replacement

>> cat foo.txt | perl -pi -e 's/
//g'
> This worked...kinda...but it ate all of the new lines, so I have one continuous line. I need to find all instances of "
," and replace them with ",".

cat foo.txt | perl -pi -e 's/
/,/g' should work.

Why don't you post the complete script so far and maybe we can combine
things in better order? I usually use vim to mangle text, but if you
need it to be usable by others, a script sounds like the best choice.

Brian

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 09-23-2008, 11:34 AM
Karl Larsen
 
Default Text Manipulation/Replacement

Ubence Quevedo wrote:
> On Sep 22, 2008, at 04:25 PM, NoOp wrote:
>
>
>> On 09/22/2008 03:53 PM, Ubence Quevedo wrote:
>>
>>> ----- Original Message ----
>>>
>>>> From: Chris Mohler <cr33dog@gmail.com>
>>>> To: "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com
>>>>
>>>> Sent: Monday, September 22, 2008 3:22:43 PM
>>>> Subject: Re: Text Manipulation/Replacement
>>>>
>>>> On Mon, Sep 22, 2008 at 4:57 PM, Ubence Quevedo wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I've used pdftotext to convert a pdf document to text and then
>>>>> used a
>>>>> combination of grep and awk to single out data and replace
>>>>> formatting
>>>>> that I didn't need.
>>>>>
>>>>> The output data eventually looks like this:
>>>>> 12,123456789
>>>>> ,0987654321
>>>>>
>>>>> But I want it to look like this:
>>>>> 12,123456789,0987654321
>>>>>
>>>>> I've tried many different things with awk, but I can't get it
>>>>> replace
, with
>>>>>
>>>> just a ,
>>>>
>>>> Hmm - I've always had headaches dealing with newlines in sed and awk
>>>> (to a lesser extent - I'm more familiar with sed).
>>>>
>>>> How about perl?
>>>>
>>>> cat foo.txt | perl -pi -e 's/
//g'
>>>>
>>>>
>>> Hi Chris,
>>>
>>> This worked...kinda...but it ate all of the new lines, so I have
>>> one continuous line. I need to find all instances of "
," and
>>> replace them with ",". That way it is very specific in what is
>>> found and replaced. I have very little perl knowledge, and my
>>> feeble attempt at modifying the perl command above failed miserably.
>>>
>>> Any other ideas?
>>>
>>> -Ubence
>>>
>>>
>> Perhaps a silly question... can you not open the pdf in Adobe Reader
>> 8,
>> then copy & paste the text to OpenOffice Writer & accomplish what
>> you want?
>>
>>
>> --
>> ubuntu-users mailing list
>> ubuntu-users@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>>
>
> If that were an option, then yes. However, I'd prefer to keep this to
> the command line as much as possible. I could take the output file
> and transfer it to my Mac and use TextWrangler to do what I want, but
> I'd rather not [since anyone else that might be doing this procedure
> in the future wouldn't have access to a Mac].
>
> -Ubence
>
>
I think the use of Open Office Write is a viable way to do the job
and anyone on Linux or Windows already has this software, or can load
it. I had to load Open Office on my my wife's Windows because we have
Word from Microsoft Office 97 that has worked fine up to now. But last
week it failed to print. With the new Office at $450.00 in the college
book store, we d/l Open Office and it prints fine :-)


Karl


--

Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
PGP 4208 4D6E 595F 22B9 FF1C ECB6 4A3C 2C54 FE23 53A7


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 09-23-2008, 12:27 PM
"Mark Haney"
 
Default Text Manipulation/Replacement

Ubence Quevedo wrote:
> Hello All,
>
> I've used pdftotext to convert a pdf document to text and then used a
> combination of grep and awk to single out data and replace formatting
> that I didn't need.
>
> The output data eventually looks like this:
> 12,123456789
> ,0987654321
>
> But I want it to look like this:
> 12,123456789,0987654321
>
> I've tried many different things with awk, but I can't get it replace
, with just a ,
>
> Does anyone have any ideas on how I can accomplish this, or at least give me an idea of what I'm doing wrong?
>
> Thanx in advance for any help.
>
> -Ubence
>
>

I'm curious to know if the first line of data ends with a
? If so,
I'd bet searching for it and removing it would fix that. The
is just
a simple return char, this looks like a Return with Newline which
grepping for
won't catch.



--
Libenter homines id quod volunt credunt -- Caius Julius Caesar


Mark Haney
Sr. Systems Administrator
ERC Broadband
(828) 350-2415

Call (866) ERC-7110 for after hours support

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 09-23-2008, 02:14 PM
Rashkae
 
Default Text Manipulation/Replacement

Brian McKee wrote:
>>> cat foo.txt | perl -pi -e 's/
//g'
>> This worked...kinda...but it ate all of the new lines, so I have one continuous line. I need to find all instances of "
," and replace them with ",".
>
> cat foo.txt | perl -pi -e 's/
/,/g' should work.
>

There is no reason to use cat. If you want to work with STDIN and
STDTOUT, use < > operators. example:

perl -p -e 'some code' < input.file > output.file

Or, if you use the i operator, that tells perl to edit the file in
place, so there's no reason to input / output

perl -pi -e 'some code' myfile.txt


This will overwrite your file with the changes.

And finally, the real problem, by default, perl only reads one line at a
time, so /n, will never exist, because each input will end at /n and
the next input will begin with ,.

The quick and dirty way to solve this is to use -077 (zero seven seven).
That will put perl in file slurp mode, which means the input will take
the whole file, rather than one line at a time.

You have to be careful not to dos yourself with this command. If you
input a large enough file, it will all go into ram until you run out and
your computer crashes in unpredictable ways that depend on your kernel
version (most of them not pretty)

perl -pi -077 -e 's/
,/,/g' myfile.txt



This command will do what you asked.
It will only nuke the /n if the next line begins with with a ,

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 09-23-2008, 02:20 PM
Rashkae
 
Default Text Manipulation/Replacement

Rashkae wrote:
> Brian McKee wrote:
>>>> cat foo.txt | perl -pi -e 's/
//g'
>>> This worked...kinda...but it ate all of the new lines, so I have one continuous line. I need to find all instances of "
," and replace them with ",".
>> cat foo.txt | perl -pi -e 's/
/,/g' should work.
>>
>
> There is no reason to use cat. If you want to work with STDIN and
> STDTOUT, use < > operators. example:
>
> perl -p -e 'some code' < input.file > output.file
>
> Or, if you use the i operator, that tells perl to edit the file in
> place, so there's no reason to input / output
>
> perl -pi -e 'some code' myfile.txt
>
>
> This will overwrite your file with the changes.
>
> And finally, the real problem, by default, perl only reads one line at a
> time, so /n, will never exist, because each input will end at /n and
> the next input will begin with ,.
>
> The quick and dirty way to solve this is to use -077 (zero seven seven).
> That will put perl in file slurp mode, which means the input will take
> the whole file, rather than one line at a time.
>
> You have to be careful not to dos yourself with this command. If you
> input a large enough file, it will all go into ram until you run out and
> your computer crashes in unpredictable ways that depend on your kernel
> version (most of them not pretty)
>
> perl -pi -077 -e 's/
,/,/g' myfile.txt
>
>
>
> This command will do what you asked.
> It will only nuke the /n if the next line begins with with a ,
>

Grr, need more coffee!!! You need 3 7's..

-0777 instead of -077 Sorry!

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 09-23-2008, 04:56 PM
Rick Stevens
 
Default Text Manipulation/Replacement

Ubence Quevedo wrote:

On Sep 22, 2008, at 04:34 PM, Rick Stevens wrote:


Ubence Quevedo wrote:

----- Original Message ----

From: Patrick O'Callaghan <pocallaghan@gmail.com>
To: fedora-list@redhat.com
Sent: Monday, September 22, 2008 3:03:35 PM
Subject: Re: Text Manipulation/Replacement

On Mon, 2008-09-22 at 14:57 -0700, Ubence Quevedo wrote:

Hello All,

I've used pdftotext to convert a pdf document to text and then used a
combination of grep and awk to single out data and replace
formatting that I didn't need.

The output data eventually looks like this:
12,123456789
,0987654321

But I want it to look like this:
12,123456789,0987654321

I've tried many different things with awk, but I can't get it
replace
, with

just a ,

For one thing, end-of-line in standard Unix text files is not
(Carriage Return), it's
(Newline).

Thanks for splitting hairs. :^)
is what first came to mind.
I've got a lead from another list that I posted on how to use perl to
accomplish what I need, but it isn't specific enough to not replace
all new lines with empty space: cat foo.txt | perl -pi -e 's/
//g'

Anyone have any ideas?


Uh, how about:

cat file.txt | sed '$!N;s/
//' >newfile.txt


Hi Rick,

I'll have to play with this some more, but this appears to have done the
trick!


Thank you so much!


You're welcome. There's actually a good page of "sed one-liners"
(helpful one-line sed scripts) at:

http://sed.sourceforge.net/sed1line.txt

Turns out the one I gave you is on that list as "# join pairs of lines
side-by-side (like "paste")", but that one strips the last "
" and
tacks on a space, which mine doesn't.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer ricks@nerd.com -
- AIM/Skype: therps2 ICQ: 22643734 Yahoo: origrps2 -
- -
- Do you know where _your_ towel is? -
----------------------------------------------------------------------

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
 
Old 09-26-2008, 05:44 AM
"D. Hugh Redelmeier"
 
Default Text Manipulation/Replacement

| From: Patrick O'Callaghan <pocallaghan@gmail.com>

| Splitting hairs is essential to programming. Do What I Mean hasn't been
| invented yet.

Splitting hairs: DWIM has been invented. It was part of InterLISP
perhaps 25 years ago.

http://www.catb.org/~esr/jargon/html/D/DWIM.html
http://ars.userfriendly.org/cartoons/?id=20011121

I don't think that it lived up to its name.



Back to the original problem:

The sed script is correct but it isn't the first one that comes to my
mind since sed's hold buffer is arcane (to me).

This ed command seems easy and ought to work:
g/^,/ .-1,.j
[Translation: for all lines that start with ",", join the previous and
current line.]

It will fail if the first line starts with ",". This next version
would work if you are sure that the file has more than one line:
2,$g/^,/ .-1,.j

It all depends on what tools you are used to.

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
 

Thread Tools




All times are GMT. The time now is 11:06 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org