FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 06-22-2012, 02:40 PM
 
Default converting .doc to html

Anyone got a preferred program or package for this? I'd like a *good* one,
and Word or OO.o's save as html in no way qualifies as other than amateur
crap.

So far, with a little googling, I've found the wv package. wvHtml works,
but I don't like the output - it insists on <div>, and on &rhquo instead
of plain, simple ".

mark "what, ask for an opinion in this shy, diffident group?"

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 03:19 PM
Keith Roberts
 
Default converting .doc to html

On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:

> To: CentOS mailing list <centos@centos.org>
> From: m.roth@5-cent.us
> Subject: [CentOS] converting .doc to html
>
> Anyone got a preferred program or package for this? I'd like a *good* one,
> and Word or OO.o's save as html in no way qualifies as other than amateur
> crap.
>
> So far, with a little googling, I've found the wv package. wvHtml works,
> but I don't like the output - it insists on <div>, and on &rhquo instead
> of plain, simple ".

I think Abiword can read and write those formats.

[root@karsites ~]# rpm -qv abiword
abiword-2.6.6-1.el5.rf

HTH

Keith

-----------------------------------------------------------
Websites:
http://www.karsites.net
http://www.php-debuggers.net
http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with
TMDA [http://tmda.net]
-----------------------------------------------------------
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 03:29 PM
 
Default converting .doc to html

Keith Roberts wrote:
> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>> From: m.roth@5-cent.us
>>
>> Anyone got a preferred program or package for this? I'd like a *good*
>> one, and Word or OO.o's save as html in no way qualifies as other than
>> amateur crap.
>>
>> So far, with a little googling, I've found the wv package. wvHtml works,
>> but I don't like the output - it insists on <div>, and on &rhquo instead
>> of plain, simple ".
>
> I think Abiword can read and write those formats.

Given that both Word and OO.o produce such lousy, uselessly cluttered
html, I'm a tad loathe to install another wp... and I really just wanted a
command line conversion tool.

As a side note, I tried quanta about 6 years ago, and that did lousy
things to my html, too (going from edit to display and back, I think it
was, unformatted the *whole* document, left justifying all, even when I
*told* it to leave formatting...), so I'm not wildly crazed with web
editing programs.

As my own personal web page reads, "this page proudly built in vi"....

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 05:05 PM
Les Mikesell
 
Default converting .doc to html

On Fri, Jun 22, 2012 at 9:40 AM, <m.roth@5-cent.us> wrote:
> Anyone got a preferred program or package for this? I'd like a *good* one,
> and Word or OO.o's save as html in no way qualifies as other than amateur
> crap.
>
> So far, with a little googling, I've found the wv package. wvHtml works,
> but I don't like the output - it insists on <div>, and on &rhquo instead
> of plain, simple ".
>

Mail it to yourself on a gmail account, then 'view' the attachment
instead of downloading the original. It is still going to have
<div>'s though.

--
Les Mikesell
lesmikesell@gmail.com
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 08:11 PM
Warren Young
 
Default converting .doc to html

On 6/22/2012 8:40 AM, m.roth@5-cent.us wrote:
>
> wvHtml works,
> but I don't like the output - it insists on <div>, and on &rhquo instead
> of plain, simple ".

You mean &rdquo;?

What's wrong with that? You wanted HTML, and *any* browser will
understand that HTML entity, even Lynx.

If you wanted "HTML I can read like an e-book", I'd say you should be
converting to Markdown instead. One path from Word to Markdown would be
unrtf (https://www.gnu.org/software/unrtf/) to HTML, then HTML to
Markdown via Pandoc (http://johnmacfarlane.net/pandoc/).
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 08:40 PM
 
Default converting .doc to html

Warren Young wrote:
> On 6/22/2012 8:40 AM, m.roth@5-cent.us wrote:
>>
>> wvHtml works,
>> but I don't like the output - it insists on <div>, and on &rhquo instead
>> of plain, simple ".
>
> You mean &rdquo;?
>
Yup.

> What's wrong with that? You wanted HTML, and *any* browser will
> understand that HTML entity, even Lynx.

Hate it. I think it's completely unnecessary. I've done web pages,
including professional and corporate ones, and never needed it. I use
special characters only when there's no other option.
>
> If you wanted "HTML I can read like an e-book", I'd say you should be
> converting to Markdown instead. One path from Word to Markdown would be
> unrtf (https://www.gnu.org/software/unrtf/) to HTML, then HTML to
> Markdown via Pandoc (http://johnmacfarlane.net/pandoc/).

How 'bout html I can read like wordperfect <alt-f3>?

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 08:48 PM
Frank Cox
 
Default converting .doc to html

On Fri, 22 Jun 2012 16:40:49 -0400
m.roth@5-cent.us wrote:

> Hate it. I think it's completely unnecessary. I've done web pages,
> including professional and corporate ones, and never needed it. I use
> special characters only when there's no other option.

Just use sed to change it to whatever you want it to be.

--
MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com
www.creekfm.com - FIFTY THOUSAND WATTS of POW WOW POWER!
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 09:56 PM
Warren Young
 
Default converting .doc to html

On 6/22/2012 2:40 PM, m.roth@5-cent.us wrote:
> Warren Young wrote:
>> On 6/22/2012 8:40 AM, m.roth@5-cent.us wrote:
>>>
>>> wvHtml works,
>>> but I don't like the output - it insists on <div>, and on &rhquo instead
>>> of plain, simple ".
>>
>> You mean &rdquo;?
>>
> Yup.
>
>> What's wrong with that? You wanted HTML, and *any* browser will
>> understand that HTML entity, even Lynx.
>
> Hate it. I think it's completely unnecessary.

Five centuries of typographers would like to have a word with you.

&rdquo; and " aren't the same thing. If the document includes curly
quotes, the only correct alternative available to the HTML converter is
to put out Unicode character U+201D.

Now, if your converter were converting straight quotation marks to
&quot;, you might have a point.

> I've done web pages,
> including professional and corporate ones, and never needed it.

IMO, web pages with straight quotation marks are unprofessional.

Let the ASCII go, Mark. Just let it go. Unicode became usable over a
decade ago, and became solid in most programs years ago.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2012, 10:11 PM
Warren Young
 
Default converting .doc to html

On 6/22/2012 3:56 PM, Warren Young wrote:
> Unicode became usable over a
> decade ago, and became solid in most programs years ago.

You know, thinking about it, I believe I've sold the Unicode on Linux
stability story short. It's about a decade since it became solid, so
"usable" must be considerably farther back; 2000, maybe? That makes
sense, since Plan9 switched to UTF-8 in 1999.

I use Perl as my benchmark for Unicode stability. RHEL 2.1 (March 2002)
shipped Perl 5.6 (March 2000), which was usable but dodgy in some ways
w.r.t. Unicode. RHEL 3 (October 2003) shipped Perl 5.8 (July 2002),
which fixed almost everything with Unicode handling. Each Perl since
then has had Unicode changes, but they've just been small bug fixes and
updates to track new Unicode specs. The core mechanisms haven't changed
since 5.8.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 06:26 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org