ruby - Login automatically to get a scraped file on Rails app with Mechanize -

July 15, 2014

to login download pdf file, have code works fine on ruby when debug. problem is, when try use code on rails app instance variable, can't download file, guess it's cookie issue didn't achieve resolve it

here code works on ruby (i can download pdf file, login success):

require 'rubygems' require 'mechanize'  agent = mechanize.new  agent.pluggable_parser.pdf = mechanize::filesaver  page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")  # login site form = page.form_with(:id => 'form-login-page') form.login = "my_login" form.password = "my_password" page = form.submit  #get pdf link     agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each       |link| agent.get link['href'] end

and below attempt on ruby on rails 3, didn't work (i can scrape link, not downloading file because getting redirected login page:

controller.rb

@agent = mechanize.new @agent.user_agent_alias = 'mac safari' @page = @agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")  # login form = @page.form_with(:id => 'form-login-page') form.login = "my_login" form.password = "my_password" @page = form.submit  # pdf link @watan = {} @agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each     |link| @watan[link.text.strip] = @agent.get link['href'] end

view.rb

<% if @watan %> <% @watan.each |key, value| %> <a href="http://www.elwatan.com<%= "#{key}" %>" target='_blank'>download file</a> <% end %> <% end %>

this long post.

first off, should place scraping code in libary, create file lib/watan_scraper.rb , fill with

module watanscraper    def self.get_all_pdfs     agent = get_agent     # pdf link      watan = []     agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each     |link|       watan << link.text.strip     end     watan   end     def self.get_single_pdf(link_text)     agent = get_agent     # pdf link      found_link= nil     agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each     |link|       if link.text.strip = link_text          found_link = link['href']       end      end      pdf =      if found_link       # fetch pdf       agent.get(found_link)     end   end     private    def get_agent     agent = mechanize.new     agent.user_agent_alias = 'mac safari'     page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")      # login     form = page.form_with(:id => 'form-login-page')     form.login = "my_login"     form.password = "my_password"     form.submit      agent   end  end

ok, , can write in controller

class pdfscontroller < applicationcontroller   def index     @watan = watanscraper.get_all_pdfs   end    def show     pdf_name = params[:id]     @pdf = watanscraper.get_pdf(pdf_name)     send_data @pdf, :filename => "#{padf_name}.pdf"   end end

your view should in file views/pdfs/index.html.haml (let's use haml

- @watan.each |link_text|    = link_to "download #{link_text}", pdf_path(link_text)

your routes should follows (config/routes.rb)

resources :pdfs, only: [:index, :show]

this code of course untested, @ least nicely structured , fetch pdf in right session (using mechanize) , sends browser.

Search This Blog

My

ruby - Login automatically to get a scraped file on Rails app with Mechanize -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Why am I getting Internal .NET Framework Data Provider error 1025 when passing Method to where? -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -