FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 12-13-2009, 07:46 AM
Stroller
 
Default OT: extract an image from a .doc file?

Hi all,

A .doc file contains an image. Is there any way to extract the image
file in its original format, please?


This may seem like a bit of an odd request, so I'll explain. The .doc
file is quite large, and it seems like the image it contains must be
to blame. I would like to extract the original file of the image and
examine it. I have tried in OpenOffice on Windows and Word for Mac. In
OpenOffice I can't see any way to save the image file, in Word for Mac
I can drag the file to the desktop but it becomes a "Picture
clipping.pictClipping" and is clearly not the original format.


I tried running `photorec` on the .doc file, but that just "finds"
the .doc file itself. I thought to use dd to zero over the first few
bytes of the .doc - maybe this would make the .doc unrecognisable to
photorec, and then photorec would maybe find the image file inside the
corrupt document, but I haven't tried that yet. I'm not sure if it'd
work, and so I thought I'd ask here to see if anyone knew of an easy
way to do this first.


TIA for any suggestions,

Stroller.
 
Old 12-13-2009, 09:50 AM
Mick
 
Default OT: extract an image from a .doc file?

On Sunday 13 December 2009 08:46:05 Stroller wrote:
> Hi all,
>
> A .doc file contains an image. Is there any way to extract the image
> file in its original format, please?
>
> This may seem like a bit of an odd request, so I'll explain. The .doc
> file is quite large, and it seems like the image it contains must be
> to blame. I would like to extract the original file of the image and
> examine it. I have tried in OpenOffice on Windows and Word for Mac. In
> OpenOffice I can't see any way to save the image file,

I don't know about MSWindows, but in OOo-bin in Linux I can right-click on the
image and select 'Save graphics' when the image is jpeg/png/etc. Not sure if
this works with MS embedded images/files from e.g. Powerpoint.
--
Regards,
Mick
 
Old 12-13-2009, 11:12 AM
Stroller
 
Default OT: extract an image from a .doc file?

On 13 Dec 2009, at 10:50, Mick wrote:
> On Sunday 13 December 2009 08:46:05 Stroller wrote:
>> A .doc file contains an image. Is there any way to extract the image
>> file in its original format, please?
>> .... I have tried in OpenOffice on Windows and Word for Mac. In
>> OpenOffice I can't see any way to save the image file,
>
> I don't know about MSWindows, but in OOo-bin in Linux I can right-click on the
> image and select 'Save graphics' when the image is jpeg/png/etc. Not sure if
> this works with MS embedded images/files from e.g. Powerpoint.

This is strange. I get the same thing in Open Office (on Windows) if I create a new .doc and add a jpeg to it.

Right-clicking on the image gives me a menu of: Arrange, Alignment, Anchor, Wrap, (separator), Picture..., Save Graphics..., Caption..., ImageMap, (separator), Cut, Copy, Paste.

If I open the file(s) I have the interest in, the first 4 entries in the context-menu are the same, but after the first separator I get instead "Object" (which did not appear previously) and "Caption". There is then another separator and instead of Cut, Copy, Paste, I see only Cut & Copy.

This file was created by the software that a lettings agency uses to manage their properties. It runs on Windows and automatically generates letters (for overdue rent, inspections &c) in .doc format. One image in question is the boss' signature, so the letters appear like he actually signed them, but I think they also use company logos in other letters.

Apart from that, I don't see why this image is treated differently by OpenOffice.

Isn't there a program (command line?) for converting .doc into HTML? Maybe that would extract the image.

The reason I'd like to see this is because some of the .doc files are 2 meg in size (some others exactly 1meg, so cluster size may affect this) and there are thousands of them taking up space on the server. If the image is to blame then we would benefit many times from the size saving. I haven't yet spoken to the site about this, only discovering it yesterday, so I don't know if I can find the file by accessing the property management software.

Cheers,

Stroller.
 
Old 12-13-2009, 11:50 AM
Mick
 
Default OT: extract an image from a .doc file?

On Sunday 13 December 2009 12:12:46 Stroller wrote:
> On 13 Dec 2009, at 10:50, Mick wrote:
> > On Sunday 13 December 2009 08:46:05 Stroller wrote:

> If I open the file(s) I have the interest in, the first 4 entries in the
> context-menu are the same, but after the first separator I get instead
> "Object" (which did not appear previously) and "Caption". There is then
> another separator and instead of Cut, Copy, Paste, I see only Cut & Copy.

This indicates that the graphic in question is an embedded MSWindows file. If
you were able to double click on it in MSWIndows it would read its metadata
and launch the respective MSWindows application for editing it; e.g. MSPaint,
PPt, Excel and what not. With OOo this API linkage is not there I guess, so
all you can do cut/copy it.

> This file was created by the software that a lettings agency uses to manage
> their properties. It runs on Windows and automatically generates letters
> (for overdue rent, inspections &c) in .doc format. One image in question
> is the boss' signature, so the letters appear like he actually signed
> them, but I think they also use company logos in other letters.

I guess that whoever created this image they did not save it as 'conventional'
image, e.g. jpeg, png, etc, and therefore OOo cannot deal with it as it would
with a normal image.

> Apart from that, I don't see why this image is treated differently by
> OpenOffice.

Because it is not an 'image' but an embedded MSWindows file in the MSWord
document with loads of its own proprietary metadata.

> Isn't there a program (command line?) for converting .doc into HTML? Maybe
> that would extract the image.

I think that MSWord has either a SaveAs or an export function which will
convert the file into HTML. Also OOo has File/Preview as HTML, which will
convert the document into html and open it in a browser - if the graphics look
correct then you could save it from with the browser.

> The reason I'd like to see this is because some of the .doc files are 2 meg
> in size (some others exactly 1meg, so cluster size may affect this) and
> there are thousands of them taking up space on the server. If the image is
> to blame then we would benefit many times from the size saving. I haven't
> yet spoken to the site about this, only discovering it yesterday, so I
> don't know if I can find the file by accessing the property management
> software.

Have you looked at what size you get with pdf'ing them?
--
Regards,
Mick
 
Old 12-13-2009, 01:57 PM
 
Default OT: extract an image from a .doc file?

On Sun, Dec 13, 2009 at 08:46:05AM +0000, Stroller wrote:

> A .doc file contains an image. Is there any way to extract the image
> file in its original format, please?

My limited experience with OpenOffice is that in slideshows, right
click on an image brings up a context menu with a save image option.
I do not know if this applies to .doc files.

--
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com
GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o
 
Old 12-13-2009, 02:01 PM
Sebastian Be▀ler
 
Default OT: extract an image from a .doc file?

Am 13.12.2009 09:46, schrieb Stroller:
> Hi all,
>
> A .doc file contains an image. Is there any way to extract the image
> file in its original format, please?

Open the doc file with OpenOffice, save it as a odt file.
The odt is a renamed zip archive that should contain the image in on of
its subfolders.

Greetings

Sebastian
 
Old 12-14-2009, 08:48 AM
Stroller
 
Default OT: extract an image from a .doc file?

On 13 Dec 2009, at 15:01, Sebastian Be▀ler wrote:

Am 13.12.2009 09:46, schrieb Stroller:

Hi all,

A .doc file contains an image. Is there any way to extract the image
file in its original format, please?


Open the doc file with OpenOffice, save it as a odt file.
The odt is a renamed zip archive that should contain the image in on
of

its subfolders.


Great idea, Sebastian.

The file which is responsible for the size of the .doc is immediately
obvious when I rename this document.odt to document.zip.


It is a 2meg file, but unfortunately, as Mick appears to have
predicted, it is called simply "Object 1" with no file extension.


Running `file` on it shows it to be a "Microsoft Office Document", but
it's apparently not the kind you can open in Word.


I suspect this is going to prove a dead loss. Thanks for your help,
though.


Stroller.
 
Old 12-14-2009, 12:01 PM
Renat Golubchyk
 
Default OT: extract an image from a .doc file?

On Mon, 14 Dec 2009 09:48:57 +0000
Stroller <stroller@stellar.eclipse.co.uk> wrote:
>
> On 13 Dec 2009, at 15:01, Sebastian Be▀ler wrote:
> > Am 13.12.2009 09:46, schrieb Stroller:
> >> Hi all,
> >>
> >> A .doc file contains an image. Is there any way to extract the
> >> image file in its original format, please?
> >
> > Open the doc file with OpenOffice, save it as a odt file.
> > The odt is a renamed zip archive that should contain the image in
> > on of
> > its subfolders.
>
> Great idea, Sebastian.
>
> The file which is responsible for the size of the .doc is
> immediately obvious when I rename this document.odt to document.zip.
>
> It is a 2meg file, but unfortunately, as Mick appears to have
> predicted, it is called simply "Object 1" with no file extension.
>
> Running `file` on it shows it to be a "Microsoft Office Document",
> but it's apparently not the kind you can open in Word.

Have you tried opening this "Object 1" file in OpenOffice and repeat
the steps above again?


Cheers,
Renat

--
Probleme kann man niemals mit derselben Denkweise loesen,
durch die sie entstanden sind.
(Einstein)
 
Old 12-14-2009, 01:43 PM
Willie Wong
 
Default OT: extract an image from a .doc file?

On Mon, Dec 14, 2009 at 02:01:50PM +0100, Penguin Lover Renat Golubchyk squawked:
> > It is a 2meg file, but unfortunately, as Mick appears to have
> > predicted, it is called simply "Object 1" with no file extension.
> >
> > Running `file` on it shows it to be a "Microsoft Office Document",
> > but it's apparently not the kind you can open in Word.
>
> Have you tried opening this "Object 1" file in OpenOffice and repeat
> the steps above again?

It would be hilarious if it were "Object N" all the way down.

I apologize if these have been covered before, but since I don't
remember seeing it:
(a) Is it not possible to extract that image in Microsoft Word
itself? (Opening the file in question in Microsoft Word and saving
the image?) What happens if you save the file in Word's funny XML
format? (Knowing MS, I wouldn't be too surprised if the image becomes
some sort of funny base64 encoded string, but it is still worth a
try.)
(b) If the Big Wig is already happily letting the computer sign those
documents for him, is it prohibitive to try the non-technological
measure? E.g., ask the Big Wig to provide another image of his
signature?
(c) If the image file is that big, it is probably because the
original that got included in the doc file has a ridiculously high
resolution (maybe they just scanned the signature in, cleaned it up a
bit? My signature usually fits in a 1/2 inch by 2 inch block, if
scanned at 24-bit color and 600 dpi, this makes almost a 1M raw
image). I hope if the processing/storage/bandwidth tax is high
enough, an "upstream" fix would not be ruled out directly.

Also, I do recall that newer versions of MS Word has the capability to
compress included images; though it is not used by default.

Cheers,

W

--
(04:01:59) W: yep
(04:02:02) W: I love linux
(04:02:15) NJYWT: I love penguins
Sortir en Pantoufles: up 1102 days, 13:18
 
Old 12-14-2009, 02:06 PM
"Arttu V."
 
Default OT: extract an image from a .doc file?

On 12/14/09, Stroller <stroller@stellar.eclipse.co.uk> wrote:
> It is a 2meg file, but unfortunately, as Mick appears to have
> predicted, it is called simply "Object 1" with no file extension.
>
> Running `file` on it shows it to be a "Microsoft Office Document", but
> it's apparently not the kind you can open in Word.
>
> I suspect this is going to prove a dead loss. Thanks for your help,
> though.

Throwing a wild guess here. Could it be a MODI object?

http://en.wikipedia.org/wiki/MODI

Then you have entered captive markets, might be hard to do much
without software from MS.

--
Arttu V.
 

Thread Tools




All times are GMT. The time now is 04:21 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ę2007 - 2008, www.linux-archive.org