regex - Extract values from a text body in Ruby -
i need extract values multi-line string (which read text body of emails). want able feed patterns parser can customize different emails later. came following:
#!/usr/bin/env ruby text1 = <<-eos lorem ipsum dolor sit amet, name: pepe manuel periquita email: pepe@manuel.net sisters: 1 brothers: 3 children: 2 lorem ipsum dolor sit amet eos pattern1 = { :exp => /name:[\s]*(.*?)$\s* email:[\s]*(.*?)$\s* sisters:[\s]*(.*?)$\s* brothers:[\s]*(.*?)$\s* children:[\s]*(.*?)$/mx, :blk => lambda |m| m.flatten! {:name => m[0], :email => m[1], :total => m.drop(2).inject(0){|sum,item| sum + item.to_i}} end } # scan on text returns #[["pepe manuel periquita", "pepe@manuel.net", "1", "3", "2"]] def do_parse text, pattern data = pattern[:blk].call(text.scan(pattern[:exp])) puts data.inspect end do_parse text1, pattern1 # ./text_parser.rb # {:email=>"pepe@manuel.net", :total=>6, :name=>"pepe manuel periquita"}
so, define pattern regular expression paired block build hash matches. "parser" takes text , apply rules executing block on result of matching regular expression against text scan.
at moment have parse emails format shown in text1 later add patterns possible extract data different emails (the format of emails fixed each type). therefore simplify pattern moving as possible "parser". code above works , extracts data of work located @ pattern...
is right way go?
could simplified or think different / better solution problem?
update
i updated parser following tonttu solution pattern hash now:
pattern2 = { :exp => /^(.+?):\s*(.+)$/, :blk => lambda |m| r = hash[m.map{|x| [x[0].downcase.to_sym, x[1]]}] {:name => r[:name], :email => r[:email], :total => r[:children].to_i + r[:brothers].to_i + r[:sisters].to_i} end }
maybe generic enough?
pp hash[*text1.scan(/^(.+?):\s(.+)$/).map{|x| [x[0].downcase.to_sym, x[1]] }.flatten] => {:sisters=>"1", :brothers=>"3", :children=>"2", :name=>"pepe manuel periquita", :email=>"pepe@manuel.net"}
Comments
Post a Comment