Text Manipulation/Replacement

NoOp glgxg at sbcglobal.net
Tue Sep 23 02:15:04 UTC 2008


On 09/22/2008 06:29 PM, Ubence Quevedo wrote:
> On Sep 22, 2008, at 04:25 PM, NoOp wrote:
> 
>> On 09/22/2008 03:53 PM, Ubence Quevedo wrote:
>>>
>>> ----- Original Message ----
>>>> From: Chris Mohler <cr33dog at gmail.com>
>>>> To: "Ubuntu user technical support, not for general discussions" <ubuntu-users at lists.ubuntu.com 
>>>> >
>>>> Sent: Monday, September 22, 2008 3:22:43 PM
>>>> Subject: Re: Text Manipulation/Replacement
>>>>
>>>> On Mon, Sep 22, 2008 at 4:57 PM, Ubence Quevedo wrote:
>>>>> Hello All,
>>>>>
>>>>> I've used pdftotext to convert a pdf document to text and then  
>>>>> used a
>>>>> combination of grep and awk to single out data and replace  
>>>>> formatting
>>>>> that I didn't need.
>>>>>
>>>>> The output data eventually looks like this:
>>>>> 12,123456789
>>>>> ,0987654321
>>>>>
>>>>> But I want it to look like this:
>>>>> 12,123456789,0987654321
>>>>>
>>>>> I've tried many different things with awk, but I can't get it  
>>>>> replace \r, with
>>>> just a ,
>>>>
>>>> Hmm - I've always had headaches dealing with newlines in sed and awk
>>>> (to a lesser extent - I'm more familiar with sed).
>>>>
>>>> How about perl?
>>>>
>>>> cat foo.txt | perl -pi -e 's/\n//g'
>>>>
>>
>>>
>>> Hi Chris,
>>>
>>> This worked...kinda...but it ate all of the new lines, so I have  
>>> one continuous line.  I need to find all instances of "\n," and  
>>> replace them with ",".  That way it is very specific in what is  
>>> found and replaced.  I have very little perl knowledge, and my  
>>> feeble attempt at modifying the perl command above failed miserably.
>>>
>>> Any other ideas?
>>>
>>> -Ubence
>>>
>>
>> Perhaps a silly question... can you not open the pdf in Adobe Reader  
>> 8,
>> then copy & paste the text to OpenOffice Writer & accomplish what  
>> you want?
>>

> If that were an option, then yes.  However, I'd prefer to keep this to  
> the command line as much as possible.  I could take the output file  
> and transfer it to my Mac and use TextWrangler to do what I want, but  
> I'd rather not [since anyone else that might be doing this procedure  
> in the future wouldn't have access to a Mac].
> 
> -Ubence
> 

Well... OOo can save it as a text file, doc file, csv, xls, odt, ods,
etc + OOo can run on your Mac. You can also copy & past to the standard
text editor (gedit) etc. So while running command line might be
desirable, copy and paste from Adobe Reader, or Evince, xPDF, just might
be easier; unless of course you are doing the conversion from within a
script, multiple files, or other.

That said... this might be of interest:
http://furtivepenguin.net/index.php?s=pdftotext
  http://www.pdfhacks.com/pdftk/






More information about the ubuntu-users mailing list