I wanted to convert a PDF document into a XLS table and after a couple of searches I could easily able to write the code in ruby and converted a Citibank PDF Statement to CSV file. This gave me enough relief that I learnt how to read a PDF file if its not password protected.
require 'rubygems'
require 'pdf/reader'
class PageTextReceiver
attr_accessor :content
def initialize
@content=[]
@kk = "false"
@i = 0
@ptr_str = ""
end
def begin_page(arg=nil)
puts ""
end
def show_text(string, *params)
if string.strip=="Previous Balance"
@kk="false"
end
if @i==4
puts @ptr_str
@ptr_str = ""
@i=0
end
if string.strip=="Sale Date" or @kk == "true"
@kk="true"
if @i==0
@ptr_str << string + "/2009,"
else
@ptr_str << string + ","
end
if (string.reverse.index(".")==2 or string=="Amount (in Rs)")
@i=4
else
@i=@i+1
end
end
end
def move_to_next_line_and_show_text
@i=0
show_text
end
alias :super_show_text :show_text
alias :set_spacing_next_line_show_text :show_text
def show_text_with_positioning(*params)
params=params.first
params.each { |str| show_text(str) if str.kind_of?(String) }
end
end
receiver = PageTextReceiver.new
(1..45).each do | x |
pdf = PDF::Reader.file("#{x}.pdf", receiver)
puts receiver.content.inspect
end
The above code use to read 45 pdf files. Say the above code is saved in read_pdf.rb
Below is the command to execute the file and store in a file (which I hope the easiest way)
ruby read_pdf.rb >> a.csv
require 'rubygems'
require 'pdf/reader'
class PageTextReceiver
attr_accessor :content
def initialize
@content=[]
@kk = "false"
@i = 0
@ptr_str = ""
end
def begin_page(arg=nil)
puts ""
end
def show_text(string, *params)
if string.strip=="Previous Balance"
@kk="false"
end
if @i==4
puts @ptr_str
@ptr_str = ""
@i=0
end
if string.strip=="Sale Date" or @kk == "true"
@kk="true"
if @i==0
@ptr_str << string + "/2009,"
else
@ptr_str << string + ","
end
if (string.reverse.index(".")==2 or string=="Amount (in Rs)")
@i=4
else
@i=@i+1
end
end
end
def move_to_next_line_and_show_text
@i=0
show_text
end
alias :super_show_text :show_text
alias :set_spacing_next_line_show_text :show_text
def show_text_with_positioning(*params)
params=params.first
params.each { |str| show_text(str) if str.kind_of?(String) }
end
end
receiver = PageTextReceiver.new
(1..45).each do | x |
pdf = PDF::Reader.file("#{x}.pdf", receiver)
puts receiver.content.inspect
end
The above code use to read 45 pdf files. Say the above code is saved in read_pdf.rb
Below is the command to execute the file and store in a file (which I hope the easiest way)
ruby read_pdf.rb >> a.csv
hope it helps you ? or If you know how to read a PDF which is password protected thru code, where I can input the password of the file, please let me know.