Outlook installation

Peter Flynn peter at silmaril.ie
Thu Apr 11 20:46:41 UTC 2019


On 11/04/2019 08:34, Liam Proven wrote:
> On Wed, 10 Apr 2019 at 23:33, Peter Flynn <peter at silmaril.ie> wrote:
>>
>> I think 2007 was the first to use OOXML and save as .docx
> 
> Yes. That's the only XML format I've seen emitted by Office.
> LibreOffice has its own, of course.

WordML was slightly different from OOXML. The period during which the 
save format was a plain .xml file with embedded Base64-encoded images 
was mercifully short.

> I don't mind XML but I do not like these internally-zipped formats.
> With the old formats, you could at least recover raw text from a
> damaged file with tools such as the Unix ``strings'' command. With
> compressed XML, files contain nothing but line noise.

I don't see the problem, I'm afraid. Unzip a .docx file and you have the 
XML right there. Extraction of the unmarked text is trivial:

   unzip -p file.docx word/document.xml | sed -e "s/<[^>]*>//g"

(Better to use proper XML utilities such as LTXML2, but OOXML is unique 
among document formats in not using white-space in element content, so 
there are usually no newlines over which start-tags might have been split.)

> I have deployed Office 2003 in production and it defaults to the
> standard DOC, XLS, PPT etc. file formats of older versions.

What we were handed out must have been the pre-release of whatever 
followed.

I can't imagine the horror of actually having to use a wordprocessor for 
doing actual writing, though. I'm a plaintext person.

P




More information about the ubuntu-users mailing list