Having trouble finding a word in multiple files

Peter Flynn peter at silmaril.ie
Thu Jun 18 15:42:09 UTC 2020


On 18/06/2020 12:16, Liam Proven wrote:
[...]
> From what I have read -- I have not looked personally -- MS' XML 
> format is only _technically_ open and documented. While it is, the 
> docs are huge, extremely complex, 

Yes, and unreadably badly-written. It's a pity, because some of the 
people driving the move to XML were serious about making it well 
documented — understandably they *wanted* people to use the new formats. 
But some of the supposed "technical requirements" were bogus, and 
foisted on the team by Marketing, who, in their ignorance as ever, were 
having kittens over opening up the file format. Microsoft Marketing does 
an excellent job marketing Microsoft products, but they should never 
have been let near the OOXML schemas.

> and at various points they basically say "contents of this field may
> include an embedded BLOB containing any of the older MS Office file
> formats"  -- so in order to decode them, you still need to maintain
> existing MS Office file import/rendering code.

For compatibility, and even MS themselves admit that no-one has ever 
done this, to their knowledge (there is probably some crackpot 
government agency somewhere doing it).

> Whereas the OpenOffice one was genuinely open and free.

Definitely. But equally badly documented.

> But this is hearsay from informed observers, not personal observation.

Largely right. I have been through both sets of specs as far as I could 
before giving up in disgust. There was an opportunity — at least insofar 
as the bid for an ISO standard was concerned — to do it right, and both 
organisations muffed it.

> It comes from my imagination, Peter. 

:-) Best place for all good ideas. It's actually much "worse" (or
"better") than that, depending on what you want to do.

header, col1, col2, col3, col4
jan, 42, 44, 43, 46
feb, 32, 46,, 76
mar,  51, 32, 54, 56

could be quite terse (separated to read, spaces not needed):

<d n="5">
   <r>
     <c t="month">Jan</c>
     <c>42</c>
     <c>44</c>
     <c>43</c>
     <c>46</c>
   </r>
...
</d>

and I won't bore you with DocBook, TEI, JATS, or HTML :-)

>> The XML Specification is clear on this. "Terseness is of minimal
>> importance."
> 
> Yerse. I strongly disagree with them on that one.

No, really it's not. A lot of people were saying "oh but data 
transmission takes time, and not everyone has broadband, and its 
expensive" — which was true in 1995 (and, horrifying to see, still is 
for many people) — but the W3C's target for generating and storing XML 
was companies, which have large, fast facilities, not individuals on 
dial-up connections. In the long run it really won't matter, and it was 
felt better to get it right for the long run even at the expense of 
difficulty in the short term. XML has been around for 25 years and we 
still hear people complaining about having to learn "new technology" :-)

> But filesystems and media are fragile.

True, but improving. Last time I had an actual irretrievable disk 
surface failure was in the early 2000s. Plus I do backup...

> Which is one of several reasons I personally try to have nothing to do
> with MS Office post 2007.

Good choice of breakpoint.

> I do run it on Macs, because I have no choice -- the older versions
> are PowerPC code and no longer execute. Also, on Mac OS X, I can turn
> off the ribbon and just use the menus.

I believe that is possible in the Windows version too, just well hidden.

P




More information about the ubuntu-users mailing list