Simplest word search application for ubuntu 12.04

Robert Heller heller at deepsoft.com
Mon Jan 5 21:23:57 UTC 2015


At Mon, 05 Jan 2015 15:40:46 -0500 dfp10 at verizon.net,         "Ubuntu user technical support,  not for general discussions" <ubuntu-users at lists.ubuntu.com> wrote:

> 
> I want to search some very large PDF files for words and word-combinations.
> evince does not seem to do this.
> pdfedit works but is too complex and only handles one page at a time.
> Are there any recommendations?

pdftohtml (should be in poppler-utils or something like that) + grep

pdftohtml will convert the PDFs to HTML file(s). HTML are just basic text 
files (with HTML tags).  So long as the words are not going to be common HTML 
tag names (probably the only problems would be 'body' or 'table', most of the 
other HTML tags are not typical natural language words), this should work.  

> Thanks
> Don Parsons
> 
> 

-- 
Robert Heller             -- 978-544-6933
Deepwoods Software        -- Custom Software Services
http://www.deepsoft.com/  -- Linux Administration Services
heller at deepsoft.com       -- Webhosting Services
                                                                          




More information about the ubuntu-users mailing list