ruby - Login automatically to get a scraped file on Rails app with Mechanize -


to login download pdf file, have code works fine on ruby when debug. problem is, when try use code on rails app instance variable, can't download file, guess it's cookie issue didn't achieve resolve it

here code works on ruby (i can download pdf file, login success):

require 'rubygems' require 'mechanize'  agent = mechanize.new  agent.pluggable_parser.pdf = mechanize::filesaver  page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")  # login site form = page.form_with(:id => 'form-login-page') form.login = "my_login" form.password = "my_password" page = form.submit  #get pdf link     agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each       |link| agent.get link['href'] end 

and below attempt on ruby on rails 3, didn't work (i can scrape link, not downloading file because getting redirected login page:

controller.rb

@agent = mechanize.new @agent.user_agent_alias = 'mac safari' @page = @agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")  # login form = @page.form_with(:id => 'form-login-page') form.login = "my_login" form.password = "my_password" @page = form.submit  # pdf link @watan = {} @agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each     |link| @watan[link.text.strip] = @agent.get link['href'] end 

view.rb

<% if @watan %> <% @watan.each |key, value| %> <a href="http://www.elwatan.com<%= "#{key}" %>" target='_blank'>download file</a> <% end %> <% end %> 

this long post.

first off, should place scraping code in libary, create file lib/watan_scraper.rb , fill with

module watanscraper    def self.get_all_pdfs     agent = get_agent     # pdf link      watan = []     agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each     |link|       watan << link.text.strip     end     watan   end     def self.get_single_pdf(link_text)     agent = get_agent     # pdf link      found_link= nil     agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each     |link|       if link.text.strip = link_text          found_link = link['href']       end      end      pdf =      if found_link       # fetch pdf       agent.get(found_link)     end   end     private    def get_agent     agent = mechanize.new     agent.user_agent_alias = 'mac safari'     page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")      # login     form = page.form_with(:id => 'form-login-page')     form.login = "my_login"     form.password = "my_password"     form.submit      agent   end  end 

ok, , can write in controller

class pdfscontroller < applicationcontroller   def index     @watan = watanscraper.get_all_pdfs   end    def show     pdf_name = params[:id]     @pdf = watanscraper.get_pdf(pdf_name)     send_data @pdf, :filename => "#{padf_name}.pdf"   end end  

your view should in file views/pdfs/index.html.haml (let's use haml

- @watan.each |link_text|    = link_to "download #{link_text}", pdf_path(link_text) 

your routes should follows (config/routes.rb)

resources :pdfs, only: [:index, :show] 

this code of course untested, @ least nicely structured , fetch pdf in right session (using mechanize) , sends browser.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -