Having trouble finding a word in multiple files
Tony Arnold
tony.arnold at manchester.ac.uk
Mon Jun 15 12:04:28 UTC 2020
All,
On Mon, 2020-06-15 at 13:51 +0200, Liam Proven wrote:
> On Mon, 15 Jun 2020 at 13:11, Karl Auer <
> kauer at biplane.com.au
> > wrote:
>
> > No fewer than three people provided essentially the same tools to
> > do
> > exactly what you say cannot be done.
>
> I didn't say it could not be done. I said it would be ludicrously
> slow.
>
> I have 12GB of data in my Dropbox. Any search involving file
> conversions would be unusable. My typical grep search workflow is:
>
> grep -r "phrase I am looking for" *
> grep -r "phrase i am looking for" *
> grep -r "phrase for which I am looking" *
> grep -r "phrase for which i am looking" *
> grep -r "totally different phrase" *
>
> Doing that across a few thousand files totalling 12GB when each file
> must be converted... No, forget about it. Not going to fly.
>
> I would walk to another room, wake my Mac, tap cmd-space "phrase I'm
> looking for" and before I sat down I'd be looking at a list of
> partial
> matches.
>
> > While it is true that grep only searches plain text, the guts of
> > these
> > files can in general be converted to plain text relatively easily,
> > allowing grep to search them.
>
> I do not know if you are aware of this but cloud storage services
> like
> this charge by the amount stored. Keeping plain-text copies of
> everything could be very expensive, and increases search time, and
> also brings in a whole new set of issues around keeping binary-file
> contents synchronised with text-file contents...
>
> Again, no. In my considered opinion, trying to attack this problem
> with conversions is completely the wrong approach and will not bring
> any good satisfying resolution, ever, under any circumstances.
>
> > Two tools mentioned, AbiWord and LibreOffice - allow doc files to
> > be
> > converted to other formats which can then be searched.
>
> Once again for the gallery: *conversion is not the answer here.*
>
I think the answer here is an indexing tool that knows how to index
file formats such as doc, docx, odt etc. Recoll has already been
mentioned but there is also Tracker, which I think is installed by
default in the latest releases of Ubuntu.
Tracker gives you some nice search tools in the Files app (nautilus in
Ubuntu). So the OP may have a solution all ready and waiting to be used.
I know Tracker has a history of using up CPU etc when it's indexing,
but I think that's been fixed. I've had it running for a while on Focal without any issues.
I agree conversion at search time is not practical, and asking the OP
to convert all her files as a one off is possibly also asking a bit much IMHO.
Regards,
Tony.
--
Tony Arnold MBCS, CITP | Senior IT Security Analyst | Directorate of IT Services | G64, Kilburn Building | The University of Manchester | Manchester M13 9PL | T: +44 161 275 6093 | M: +44 773 330 0039
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3588 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20200615/8bb739b4/attachment.bin>
More information about the ubuntu-users
mailing list