> On 09/22/2008 03:53 PM, Ubence Quevedo wrote:
>>
>> ----- Original Message ----
>>> From: Chris Mohler <cr33dog@gmail.com>
>>> To: "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com
>>> >
>>> Sent: Monday, September 22, 2008 3:22:43 PM
>>> Subject: Re: Text Manipulation/Replacement
>>>
>>> On Mon, Sep 22, 2008 at 4:57 PM, Ubence Quevedo wrote:
>>>> Hello All,
>>>>
>>>> I've used pdftotext to convert a pdf document to text and then
>>>> used a
>>>> combination of grep and awk to single out data and replace
>>>> formatting
>>>> that I didn't need.
>>>>
>>>> The output data eventually looks like this:
>>>> 12,123456789
>>>> ,0987654321
>>>>
>>>> But I want it to look like this:
>>>> 12,123456789,0987654321
>>>>
>>>> I've tried many different things with awk, but I can't get it
>>>> replace
, with
>>> just a ,
>>>
>>> Hmm - I've always had headaches dealing with newlines in sed and awk
>>> (to a lesser extent - I'm more familiar with sed).
>>>
>>> How about perl?
>>>
>>> cat foo.txt | perl -pi -e 's/
//g'
>>>
>
>>
>> Hi Chris,
>>
>> This worked...kinda...but it ate all of the new lines, so I have
>> one continuous line. I need to find all instances of "
," and
>> replace them with ",". That way it is very specific in what is
>> found and replaced. I have very little perl knowledge, and my
>> feeble attempt at modifying the perl command above failed miserably.
>>
>> Any other ideas?
>>
>> -Ubence
>>
>
> Perhaps a silly question... can you not open the pdf in Adobe Reader
> 8,
> then copy & paste the text to OpenOffice Writer & accomplish what
> you want?
>
>
> --
> ubuntu-users mailing list
> ubuntu-users@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
If that were an option, then yes. However, I'd prefer to keep this to
the command line as much as possible. I could take the output file
and transfer it to my Mac and use TextWrangler to do what I want, but
I'd rather not [since anyone else that might be doing this procedure
in the future wouldn't have access to a Mac].
-Ubence
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
09-23-2008, 01:36 AM
Ubence Quevedo
Text Manipulation/Replacement
On Sep 22, 2008, at 04:34 PM, Rick Stevens wrote:
Ubence Quevedo wrote:
----- Original Message ----
From: Patrick O'Callaghan <pocallaghan@gmail.com>
To: fedora-list@redhat.com
Sent: Monday, September 22, 2008 3:03:35 PM
Subject: Re: Text Manipulation/Replacement
On Mon, 2008-09-22 at 14:57 -0700, Ubence Quevedo wrote:
Hello All,
I've used pdftotext to convert a pdf document to text and then
used a
combination of grep and awk to single out data and replace
formatting that I didn't need.
The output data eventually looks like this:
12,123456789
,0987654321
But I want it to look like this:
12,123456789,0987654321
I've tried many different things with awk, but I can't get it
replace
, with
just a ,
For one thing, end-of-line in standard Unix text files is not
(Carriage Return), it's
(Newline).
poc
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Thanks for splitting hairs. :^)
is what first came to mind.
I've got a lead from another list that I posted on how to use perl
to accomplish what I need, but it isn't specific enough to not
replace all new lines with empty space: cat foo.txt | perl -pi -e
's/
//g'
Anyone have any ideas?
Uh, how about:
cat file.txt | sed '$!N;s/
//' >newfile.txt
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer ricks@nerd.com -
- AIM/Skype: therps2 ICQ: 22643734 Yahoo: origrps2 -
- -
- Fear is finding a ".vbs" script in your Inbox -
----------------------------------------------------------------------
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Hi Rick,
I'll have to play with this some more, but this appears to have done
the trick!
Thank you so much!
-Ubence
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
09-23-2008, 02:15 AM
NoOp
Text Manipulation/Replacement
On 09/22/2008 06:29 PM, Ubence Quevedo wrote:
> On Sep 22, 2008, at 04:25 PM, NoOp wrote:
>
>> On 09/22/2008 03:53 PM, Ubence Quevedo wrote:
>>>
>>> ----- Original Message ----
>>>> From: Chris Mohler <cr33dog@gmail.com>
>>>> To: "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com
>>>> >
>>>> Sent: Monday, September 22, 2008 3:22:43 PM
>>>> Subject: Re: Text Manipulation/Replacement
>>>>
>>>> On Mon, Sep 22, 2008 at 4:57 PM, Ubence Quevedo wrote:
>>>>> Hello All,
>>>>>
>>>>> I've used pdftotext to convert a pdf document to text and then
>>>>> used a
>>>>> combination of grep and awk to single out data and replace
>>>>> formatting
>>>>> that I didn't need.
>>>>>
>>>>> The output data eventually looks like this:
>>>>> 12,123456789
>>>>> ,0987654321
>>>>>
>>>>> But I want it to look like this:
>>>>> 12,123456789,0987654321
>>>>>
>>>>> I've tried many different things with awk, but I can't get it
>>>>> replace
, with
>>>> just a ,
>>>>
>>>> Hmm - I've always had headaches dealing with newlines in sed and awk
>>>> (to a lesser extent - I'm more familiar with sed).
>>>>
>>>> How about perl?
>>>>
>>>> cat foo.txt | perl -pi -e 's/
//g'
>>>>
>>
>>>
>>> Hi Chris,
>>>
>>> This worked...kinda...but it ate all of the new lines, so I have
>>> one continuous line. I need to find all instances of "
," and
>>> replace them with ",". That way it is very specific in what is
>>> found and replaced. I have very little perl knowledge, and my
>>> feeble attempt at modifying the perl command above failed miserably.
>>>
>>> Any other ideas?
>>>
>>> -Ubence
>>>
>>
>> Perhaps a silly question... can you not open the pdf in Adobe Reader
>> 8,
>> then copy & paste the text to OpenOffice Writer & accomplish what
>> you want?
>>
> If that were an option, then yes. However, I'd prefer to keep this to
> the command line as much as possible. I could take the output file
> and transfer it to my Mac and use TextWrangler to do what I want, but
> I'd rather not [since anyone else that might be doing this procedure
> in the future wouldn't have access to a Mac].
>
> -Ubence
>
Well... OOo can save it as a text file, doc file, csv, xls, odt, ods,
etc + OOo can run on your Mac. You can also copy & past to the standard
text editor (gedit) etc. So while running command line might be
desirable, copy and paste from Adobe Reader, or Evince, xPDF, just might
be easier; unless of course you are doing the conversion from within a
script, multiple files, or other.
That said... this might be of interest:
http://furtivepenguin.net/index.php?s=pdftotext
http://www.pdfhacks.com/pdftk/
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
09-23-2008, 02:22 AM
"Brian McKee"
Text Manipulation/Replacement
>> cat foo.txt | perl -pi -e 's/
//g'
> This worked...kinda...but it ate all of the new lines, so I have one continuous line. I need to find all instances of "
," and replace them with ",".
cat foo.txt | perl -pi -e 's/
/,/g' should work.
Why don't you post the complete script so far and maybe we can combine
things in better order? I usually use vim to mangle text, but if you
need it to be usable by others, a script sounds like the best choice.
Brian
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
09-23-2008, 11:34 AM
Karl Larsen
Text Manipulation/Replacement
Ubence Quevedo wrote:
> On Sep 22, 2008, at 04:25 PM, NoOp wrote:
>
>
>> On 09/22/2008 03:53 PM, Ubence Quevedo wrote:
>>
>>> ----- Original Message ----
>>>
>>>> From: Chris Mohler <cr33dog@gmail.com>
>>>> To: "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com
>>>>
>>>> Sent: Monday, September 22, 2008 3:22:43 PM
>>>> Subject: Re: Text Manipulation/Replacement
>>>>
>>>> On Mon, Sep 22, 2008 at 4:57 PM, Ubence Quevedo wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I've used pdftotext to convert a pdf document to text and then
>>>>> used a
>>>>> combination of grep and awk to single out data and replace
>>>>> formatting
>>>>> that I didn't need.
>>>>>
>>>>> The output data eventually looks like this:
>>>>> 12,123456789
>>>>> ,0987654321
>>>>>
>>>>> But I want it to look like this:
>>>>> 12,123456789,0987654321
>>>>>
>>>>> I've tried many different things with awk, but I can't get it
>>>>> replace
, with
>>>>>
>>>> just a ,
>>>>
>>>> Hmm - I've always had headaches dealing with newlines in sed and awk
>>>> (to a lesser extent - I'm more familiar with sed).
>>>>
>>>> How about perl?
>>>>
>>>> cat foo.txt | perl -pi -e 's/
//g'
>>>>
>>>>
>>> Hi Chris,
>>>
>>> This worked...kinda...but it ate all of the new lines, so I have
>>> one continuous line. I need to find all instances of "
," and
>>> replace them with ",". That way it is very specific in what is
>>> found and replaced. I have very little perl knowledge, and my
>>> feeble attempt at modifying the perl command above failed miserably.
>>>
>>> Any other ideas?
>>>
>>> -Ubence
>>>
>>>
>> Perhaps a silly question... can you not open the pdf in Adobe Reader
>> 8,
>> then copy & paste the text to OpenOffice Writer & accomplish what
>> you want?
>>
>>
>> --
>> ubuntu-users mailing list
>> ubuntu-users@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>>
>
> If that were an option, then yes. However, I'd prefer to keep this to
> the command line as much as possible. I could take the output file
> and transfer it to my Mac and use TextWrangler to do what I want, but
> I'd rather not [since anyone else that might be doing this procedure
> in the future wouldn't have access to a Mac].
>
> -Ubence
>
>
I think the use of Open Office Write is a viable way to do the job
and anyone on Linux or Windows already has this software, or can load
it. I had to load Open Office on my my wife's Windows because we have
Word from Microsoft Office 97 that has worked fine up to now. But last
week it failed to print. With the new Office at $450.00 in the college
book store, we d/l Open Office and it prints fine :-)
Karl
--
Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
PGP 4208 4D6E 595F 22B9 FF1C ECB6 4A3C 2C54 FE23 53A7
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
09-23-2008, 12:27 PM
"Mark Haney"
Text Manipulation/Replacement
Ubence Quevedo wrote:
> Hello All,
>
> I've used pdftotext to convert a pdf document to text and then used a
> combination of grep and awk to single out data and replace formatting
> that I didn't need.
>
> The output data eventually looks like this:
> 12,123456789
> ,0987654321
>
> But I want it to look like this:
> 12,123456789,0987654321
>
> I've tried many different things with awk, but I can't get it replace
, with just a ,
>
> Does anyone have any ideas on how I can accomplish this, or at least give me an idea of what I'm doing wrong?
>
> Thanx in advance for any help.
>
> -Ubence
>
>
I'm curious to know if the first line of data ends with a
? If so,
I'd bet searching for it and removing it would fix that. The
is just
a simple return char, this looks like a Return with Newline which
grepping for
won't catch.
--
Libenter homines id quod volunt credunt -- Caius Julius Caesar
Mark Haney
Sr. Systems Administrator
ERC Broadband
(828) 350-2415
Call (866) ERC-7110 for after hours support
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
09-23-2008, 02:14 PM
Rashkae
Text Manipulation/Replacement
Brian McKee wrote:
>>> cat foo.txt | perl -pi -e 's/
//g'
>> This worked...kinda...but it ate all of the new lines, so I have one continuous line. I need to find all instances of "
," and replace them with ",".
>
> cat foo.txt | perl -pi -e 's/
/,/g' should work.
>
There is no reason to use cat. If you want to work with STDIN and
STDTOUT, use < > operators. example:
perl -p -e 'some code' < input.file > output.file
Or, if you use the i operator, that tells perl to edit the file in
place, so there's no reason to input / output
perl -pi -e 'some code' myfile.txt
This will overwrite your file with the changes.
And finally, the real problem, by default, perl only reads one line at a
time, so /n, will never exist, because each input will end at /n and
the next input will begin with ,.
The quick and dirty way to solve this is to use -077 (zero seven seven).
That will put perl in file slurp mode, which means the input will take
the whole file, rather than one line at a time.
You have to be careful not to dos yourself with this command. If you
input a large enough file, it will all go into ram until you run out and
your computer crashes in unpredictable ways that depend on your kernel
version (most of them not pretty)
perl -pi -077 -e 's/
,/,/g' myfile.txt
This command will do what you asked.
It will only nuke the /n if the next line begins with with a ,
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
09-23-2008, 02:20 PM
Rashkae
Text Manipulation/Replacement
Rashkae wrote:
> Brian McKee wrote:
>>>> cat foo.txt | perl -pi -e 's/
//g'
>>> This worked...kinda...but it ate all of the new lines, so I have one continuous line. I need to find all instances of "
," and replace them with ",".
>> cat foo.txt | perl -pi -e 's/
/,/g' should work.
>>
>
> There is no reason to use cat. If you want to work with STDIN and
> STDTOUT, use < > operators. example:
>
> perl -p -e 'some code' < input.file > output.file
>
> Or, if you use the i operator, that tells perl to edit the file in
> place, so there's no reason to input / output
>
> perl -pi -e 'some code' myfile.txt
>
>
> This will overwrite your file with the changes.
>
> And finally, the real problem, by default, perl only reads one line at a
> time, so /n, will never exist, because each input will end at /n and
> the next input will begin with ,.
>
> The quick and dirty way to solve this is to use -077 (zero seven seven).
> That will put perl in file slurp mode, which means the input will take
> the whole file, rather than one line at a time.
>
> You have to be careful not to dos yourself with this command. If you
> input a large enough file, it will all go into ram until you run out and
> your computer crashes in unpredictable ways that depend on your kernel
> version (most of them not pretty)
>
> perl -pi -077 -e 's/
,/,/g' myfile.txt
>
>
>
> This command will do what you asked.
> It will only nuke the /n if the next line begins with with a ,
>
Grr, need more coffee!!! You need 3 7's..
-0777 instead of -077 Sorry!
--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
09-23-2008, 04:56 PM
Rick Stevens
Text Manipulation/Replacement
Ubence Quevedo wrote:
On Sep 22, 2008, at 04:34 PM, Rick Stevens wrote:
Ubence Quevedo wrote:
----- Original Message ----
From: Patrick O'Callaghan <pocallaghan@gmail.com>
To: fedora-list@redhat.com
Sent: Monday, September 22, 2008 3:03:35 PM
Subject: Re: Text Manipulation/Replacement
On Mon, 2008-09-22 at 14:57 -0700, Ubence Quevedo wrote:
Hello All,
I've used pdftotext to convert a pdf document to text and then used a
combination of grep and awk to single out data and replace
formatting that I didn't need.
The output data eventually looks like this:
12,123456789
,0987654321
But I want it to look like this:
12,123456789,0987654321
I've tried many different things with awk, but I can't get it
replace
, with
just a ,
For one thing, end-of-line in standard Unix text files is not
(Carriage Return), it's
(Newline).
Thanks for splitting hairs. :^)
is what first came to mind.
I've got a lead from another list that I posted on how to use perl to
accomplish what I need, but it isn't specific enough to not replace
all new lines with empty space: cat foo.txt | perl -pi -e 's/
//g'
Anyone have any ideas?
Uh, how about:
cat file.txt | sed '$!N;s/
//' >newfile.txt
Hi Rick,
I'll have to play with this some more, but this appears to have done the
trick!
Thank you so much!
You're welcome. There's actually a good page of "sed one-liners"
(helpful one-line sed scripts) at:
http://sed.sourceforge.net/sed1line.txt
Turns out the one I gave you is on that list as "# join pairs of lines
side-by-side (like "paste")", but that one strips the last "
" and
tacks on a space, which mine doesn't.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer ricks@nerd.com -
- AIM/Skype: therps2 ICQ: 22643734 Yahoo: origrps2 -
- -
- Do you know where _your_ towel is? -
----------------------------------------------------------------------
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
09-26-2008, 05:44 AM
"D. Hugh Redelmeier"
Text Manipulation/Replacement
| From: Patrick O'Callaghan <pocallaghan@gmail.com>
| Splitting hairs is essential to programming. Do What I Mean hasn't been
| invented yet.
Splitting hairs: DWIM has been invented. It was part of InterLISP
perhaps 25 years ago.
The sed script is correct but it isn't the first one that comes to my
mind since sed's hold buffer is arcane (to me).
This ed command seems easy and ought to work:
g/^,/ .-1,.j
[Translation: for all lines that start with ",", join the previous and
current line.]
It will fail if the first line starts with ",". This next version
would work if you are sure that the file has more than one line:
2,$g/^,/ .-1,.j
It all depends on what tools you are used to.
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines