Having trouble finding a word in multiple files
Mike Marchywka
marchywka at hotmail.com
Thu Jun 18 12:32:29 UTC 2020
On Thu, Jun 18, 2020 at 12:58:29PM +0100, Chris Green wrote:
> On Thu, Jun 18, 2020 at 01:16:00PM +0200, Liam Proven wrote:
> > On Thu, 18 Jun 2020 at 00:04, Peter Flynn <peter at silmaril.ie> wrote:
> > >
> > > I think that's what Chris meant. The .doc files are bigger than the text
> > > they contain.
> >
> > DOC files were bigger, yes. But no, plain text isn't more compact than
> > _any_ other form. Text is highly compressible, so any representation
> > with internal compression will be smaller.
> >
> Well, yes, compressed text is smaller than uncompressed text, but you
> have to uncompress it to search it (even if only momentarily).
>
> Much of the advantage of plain text and/or simple markup language is
> that you can find things *with context* in them using grep. The xml
> used to store docx files can be searched with grep but you can't see
> just the line containing the text you are looking for and/or you can't
> ask grep to show you the two or three lines before and after.
This is the kind of thing that makes LaTex source attractive
for many documents including email. My work with a block collapsible
viewer, that I guess you could turn into an editor, was
based on ideas like this. Verbosity is not just a waste of computer
resources but confusing to humans. And then when you want to scale up,
well efficiency matters.
I guess others pointed to what amounts to XML "error checking"
or so many syntax constraints it would be hard to create garbage
outside of data fields. My point about compression was that
if you are looking for something tractable - like motion is
common in video - that may be a good basis for compression
useful for indexing. "Where is the motion?", is common for a
video search, in the case of text you may just want a vocabulary list.
My ssv compression IIRC built a dictionary that would be all you need
to search and replaced the ssv entries with ascil short-strings
and you could still run bz2 or whatever on the stuff.
So, yes I hate xml and proprietary binary formats but
in general the latex syntax does a great job for human readability
and moving a lot of details like formating into a macro def.
IT is better than html for that too as it is just glorified xml.
And then there is troff lol...
>
> --
> Chris Green
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
--
mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X
More information about the ubuntu-users
mailing list