Converting pdfs
lazer100
lazer100 at talktalk.net
Mon Oct 29 18:51:52 UTC 2012
On 26-Oct-12 08:40:26 rikona wrote:
>I have a ton of stuff stored in pdf format that will need to be
>accessed for many years to come. With Adobe dropping some linux
>support, is this a significant risk? Is there a way to convert pdfs to
>another format, **reliably**, so I can start to convert docs to a more
>accessible long-term format?
>
pdf is the standard format for fully formatted documents.
advantages of pdf:
can be handled on most operating systems via Ghostscript which is
also on Linux and Windows,
its essentially a vector format, as its a distilled form of postscript
which is a vector page description language, when it does pixels these in fact
are vector rectangles. The vector format includes various curves,
and has inbuilt facilities for new fonts,
the vector basis of pdf means that things like company logos can
be defined where no matter how high res your printer, the logo
will be smooth up to the res of the printer, and the colour will
be as accurate as your printer's technology,
basically its a very sophisticated page description system,
and the underlying postscript has been around from the early days
when the original Apple Mac was around.
and an image when printed should look the same regardless of the printer,
up to the limits of the printer. ie its an absolute page description system.
Now you can convert pdf to various other lower level universal formats
such as tiff, jpeg, ascii using Ghostscript. I often convert larger
pdf documents to tiff or jpeg because I find it nicer just to
access the jpeg directly and some desktops will display the thumbnails
very directly. Here the pages would be page1.jpg page2.jpg etc
and I can annotate these names to say page35_some_annotation.jpg
I find this more useful than having the pages encapsulated as a document!
converting to ascii also is useful because ascii is the most direct
format and you can use word search and annotate and its all usable
on all operating systems even the early ones.
to convert pdf to tiff, jpeg, ascii use the following:
to convert to jpeg, use the following all on one line: (my emails sometimes
get reformatted
and what was written on one line can get split to several)
gs -sDEVICE=jpeg -dJPEGQ=100 -r200 -dBATCH -dNOPAUSE
-sOutputFile=/some/path/page%d.jpg some.pdf
here JPEGQ is the quality, 100 is maximum quality, but is lossy, -r200 sets
the resolution
to 200 dpi, if you wanted 72dpi then use -r72
to convert to uncompressed tiff, which is lossless, use the following all on
one line:
gs -sDEVICE=tiff24nc -r200 -dBATCH -dNOPAUSE
-sOutputFile=/another/path/page%d.tif another.pdf
this creates a much bigger file but evades the time overhead of compression
and
possibly lossy compression,
and to convert to ascii is a tricky command line:
gs -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE "" -c save -f
ps2ascii.ps input.pdf -c quit >output.txt
the "" is any further gs command line options you may want, as "" it means
none,
I have put it there as a placemarker in case you wanted some, eg to print
pages 11 to 23 you would use further options "-dFirstPage=11 -dLastPage=23"
similarly if you wanted to convert pages 11 to 23 to tiff, you would use:
gs -sDEVICE=tiff24nc -r200 -dBATCH -dNOPAUSE -dFirstPage=11 -dLastPage=23
-sOutputFile=/another/path/page%d.tif another.pdf
without "", these page numbers are the low level page numbers, and not the
ones
printed on the page. when you print the entire document, the low level page
numbers are
1, 2, 3, 4, 5, ....
Ghostscript takes a bit of getting used to, but is very useful by being a very
low level ps and pdf engine
which allows you to do many useful things, and also can allow you to print to
some printers
via any operating system, ie it has some operating system independent drivers,
although
they have moved away from this idea.
More information about the ubuntu-users
mailing list