Having trouble finding a word in multiple files

Robert Heller heller at deepsoft.com
Sun Jun 14 11:22:10 UTC 2020


At Sun, 14 Jun 2020 11:34:32 +0100 "Ubuntu user technical support,  not for general discussions" <ubuntu-users at lists.ubuntu.com> wrote:

> 
> On Sun, Jun 14, 2020 at 05:17:57AM -0400, Pat Brown wrote:
> > 
> > Unfortunately, none of those suggestions worked. Perhaps it's because
> > the files I'm searching are either .doc, .docx  or .odt files. The
> > Dropbox folder is at the root of my home directory and it is an actual
> > folder that contains multiple folders under it.
> > 
> Yes, well, why didn't you say to start with! :-)
> 
> You can't (directly) search the above file types with grep, grep
> searches for strings in *text* files, or at least in files where the
> text you are looking for is stored 'as is'.
> 
> .docx is definitely a compressed format so a tool for searching it
> will need to decompress the files (at the very least) before searching.

A .docx or .odt file is actually a Zip file, containing (amounst other things)
XML file(s), which are just text files. So a script that uses unzip might
work:

#!/bin/bash
# $1 -- some grep expression
# $2 -- a .docx or .odt file
#
regexp = $1
document = $2
tempname=`mktemp -d`
unzip -qq $document -d $tempname
grep -r -q  "$regexp" $tempname
if $?; then
   echo "$regexp is in $document"
else
   echo "$regexp is not in $document"
fi
rm -rf $tempname

> 
> Isn't it possible to search multiple files with Libreoffice?  That
> should manage the above file types.
> 
> 
> All of this is why I steer well clear of non-text ways of storing what
> is basically text.  I use reStructuredText and/or Dokuwiki's own
> (text) markup language, both easy to search with grep.

As is LaTeX or Doxygen embeded in program sources (.h, .c, .tcl, etc.).

> 

-- 
Robert Heller             -- 978-544-6933 Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software        -- Custom Software Services
http://www.deepsoft.com/  -- Linux Administration Services
heller at deepsoft.com       -- Webhosting Services
                                                                                         




More information about the ubuntu-users mailing list