FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 04-09-2012, 05:09 PM
"James B. Byrne"
 
Default Need to split long lines in mail archives

CentOS-6.2

I am investigating how to split long lines present in a
Mailman generated html archives. Mailman places the email
bodies within <pre></pre> tags and some users have MUAs
that send entire paragraphs as one long line.

I have looked at fmt and fold but these assume a pipeline
from stdout to a fixed filename, which presumably is best
done at the time of the original file's creation. I am
looking for a way to deal with multiple existing files in
a batch fashion so that the reformatted file is written
back out to the same file name oin the same location.

I cannot seem to hit upon a way to get this to work using
find, xargs and fmt (or fold). Nor can I seem to find an
example of how this might be done using these utilities.

What I would like to discover is the functional equivalent
of this:

find /path/to/archives/*.html -print | xargs -I {} fmt -s
{} > {}

This syntax does not work of course because the xargs file
name substitution only occurs once in the initial argument
list of the following command. But, this example does
describe the effect I wish to obtain, to have the original
file name receive the reformatted contents.


--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-09-2012, 05:34 PM
Woodchuck
 
Default Need to split long lines in mail archives

On Mon, Apr 9, 2012 at 1:09 PM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:
> CentOS-6.2
>
> I am investigating how to split long lines present in a
> Mailman generated html archives. *Mailman places the email
> bodies within <pre></pre> tags and some users have MUAs
> that send entire paragraphs as one long line.

Such users are usually tough customers, too. "flowed text"
is the way they assert their personalities, I think.

> I have looked at fmt and fold but these assume a pipeline
> from stdout to a fixed filename, which presumably is best
> done at the time of the original file's creation. *I am
> looking for a way to deal with multiple existing files in
> a batch fashion so that the reformatted file is written
> back out to the same file name oin the same location.

It is very rare to see a Unix utility that operates "in place"
like this. Off hand, I can't think of any.

> I cannot seem to hit upon a way to get this to work using
> find, xargs and fmt (or fold). *Nor can I seem to find an
> example of how this might be done using these utilities.
>
> What I would like to discover is the functional equivalent
> of this:
>
> find /path/to/archives/*.html -print | xargs -I {} fmt -s
> {} > {}
>
> This syntax does not work of course because the xargs file
> name substitution only occurs once in the initial argument
> list of the following command. But, this example does
> describe the effect I wish to obtain, to have the original
> file name receive the reformatted contents.


Assuming that the fmt utility does what you want,
then you will need a stanza something like this:

fmt -flagswhatever FILENAME >/tmp/mytemp
mv /tmp/mytemp FILENAME

In other words you need a script, not a single pipe.
You want fmt to operate on one file at a time.

find somedir -name "*.html" >/tmp/htmlstuff
for FILENAME in `cat /tmp/htmlstuff`
do
fmt (flags) $FILENAME >/tmp/foo
mv /tmp/foo $FILENAME
done

That's not robust but is just for concept. More robust scripts
would use "read" to get filenames, and would worry about
embedded blanks in filenames, and other niceties. A real
script would use mktemp to generate a temp filename.

fmt(1) is not robust, either. It will format the whole file with
a single-minded determination. This includes mail headers,
attachments, blah blah. it might even break the html. There
are many unexpected consequences.

My advice is to not format these mails. Why do you want to?
perhaps there is a work-around that meets your goals.

Dave
--
"The Earth is a farm. We are someone else's property."* -- Charles Fort
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-09-2012, 05:38 PM
ken
 
Default Need to split long lines in mail archives

Sounds like you need to loop through a bunch of files and process each
separately... so:

$!/bin/bash

cd /path/to/archives/
for $f in $(find . -name *.html)
do
fmt -s $f > $f.out
mv $f.out $f # rename back to original name
done

Untested. But this is basically what you want to do. And it's a good
sort of structure to pick up on. You'll use it often.


hth,
ken

On 04/09/2012 01:09 PM James B. Byrne wrote:
> CentOS-6.2
>
> I am investigating how to split long lines present in a
> Mailman generated html archives. Mailman places the email
> bodies within<pre></pre> tags and some users have MUAs
> that send entire paragraphs as one long line.
>
> I have looked at fmt and fold but these assume a pipeline
> from stdout to a fixed filename, which presumably is best
> done at the time of the original file's creation. I am
> looking for a way to deal with multiple existing files in
> a batch fashion so that the reformatted file is written
> back out to the same file name oin the same location.
>
> I cannot seem to hit upon a way to get this to work using
> find, xargs and fmt (or fold). Nor can I seem to find an
> example of how this might be done using these utilities.
>
> What I would like to discover is the functional equivalent
> of this:
>
> find /path/to/archives/*.html -print | xargs -I {} fmt -s
> {}> {}
>
> This syntax does not work of course because the xargs file
> name substitution only occurs once in the initial argument
> list of the following command. But, this example does
> describe the effect I wish to obtain, to have the original
> file name receive the reformatted contents.
>
>
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-09-2012, 05:41 PM
fred smith
 
Default Need to split long lines in mail archives

On Mon, Apr 09, 2012 at 01:09:59PM -0400, James B. Byrne wrote:
> CentOS-6.2
>
> I am investigating how to split long lines present in a
> Mailman generated html archives. Mailman places the email
> bodies within <pre></pre> tags and some users have MUAs
> that send entire paragraphs as one long line.
>
> I have looked at fmt and fold but these assume a pipeline
> from stdout to a fixed filename, which presumably is best

fold reads stdin and writes stdout. I have a script I use all
the time that depends on that, so I know it works that way.

here's an excerpt from "man fold" on my system:

FOLD(1) User Commands FOLD(1)

NAME
fold - wrap each input line to fit in specified width

SYNOPSIS
fold [OPTION]... [FILE]...

DESCRIPTION
Wrap input lines in each FILE (standard input by default), writing to standard output.


--
---- Fred Smith -- fredex@fcshome.stoneham.ma.us -----------------------------
But God demonstrates his own love for us in this:
While we were still sinners,
Christ died for us.
------------------------------- Romans 5:8 (niv) ------------------------------
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-09-2012, 09:54 PM
Paul Heinlein
 
Default Need to split long lines in mail archives

On Mon, 9 Apr 2012, Woodchuck wrote:


fmt -flagswhatever FILENAME >/tmp/mytemp
mv /tmp/mytemp FILENAME


You might find the tidy utility (http://tidy.sourceforge.net/) handy
for this operation. It accepts the "-wrap N" option, which works great
in my toolchains, but I've never tried it on Mailman archives.


fmt(1) is not robust, either. It will format the whole file with
a single-minded determination. This includes mail headers,
attachments, blah blah. it might even break the html. There
are many unexpected consequences.


tidy won't break the HTML, at least, but there may indeed be
unintended consequences. Testing suggested. :-)


--
Paul Heinlein
heinlein@madboa.com
4538' N, 1226' W_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-10-2012, 12:20 AM
 
Default Need to split long lines in mail archives

>> On Mon, 9 Apr 2012 13:09:59 -0400,
>> "James B. Byrne" <byrnejb@harte-lyne.ca> said:

J> I am investigating how to split long lines present in a Mailman
J> generated html archives. Mailman places the email bodies within
J> <pre></pre> tags and some users have MUAs that send entire paragraphs as
J> one long line.

A Perl module called "Text::Format" is perfect for this. Could you post
(or send) a link to a Mailman-generated HTML archive that has the
problem you describe? Then I can show a before-and-after along with a
script that'll at least give you a starting point.

--
Karl Vogel I don't speak for the USAF or my company

You know you're addicted to the Internet when your phone bill comes to
your door in a box.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 04-10-2012, 10:00 AM
John Doe
 
Default Need to split long lines in mail archives

From: James B. Byrne <byrnejb@harte-lyne.ca>

> What I would like to discover is the functional equivalent
> of this:
> find /path/to/archives/*.html -print | xargs -I {} fmt -s
> {} > {}
> This syntax does not work of course because the xargs file
> name substitution only occurs once in the initial argument
> list of the following command. But, this example does
> describe the effect I wish to obtain, to have the original
> file name receive the reformatted contents.

What about a simple:
find ... | while read F; do ... "$F"; done

JD
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 09:04 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org