ruby - Login automatically to get a scraped file on Rails app with Mechanize -
to login download pdf file, have code works fine on ruby when debug. problem is, when try use code on rails app instance variable, can't download file, guess it's cookie issue didn't achieve resolve it
here code works on ruby (i can download pdf file, login success):
require 'rubygems' require 'mechanize' agent = mechanize.new agent.pluggable_parser.pdf = mechanize::filesaver page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php") # login site form = page.form_with(:id => 'form-login-page') form.login = "my_login" form.password = "my_password" page = form.submit #get pdf link agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each |link| agent.get link['href'] end
and below attempt on ruby on rails 3, didn't work (i can scrape link, not downloading file because getting redirected login page:
controller.rb
@agent = mechanize.new @agent.user_agent_alias = 'mac safari' @page = @agent.get("http://elwatan.com/sso/inscription/inscription_payant.php") # login form = @page.form_with(:id => 'form-login-page') form.login = "my_login" form.password = "my_password" @page = form.submit # pdf link @watan = {} @agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each |link| @watan[link.text.strip] = @agent.get link['href'] end
view.rb
<% if @watan %> <% @watan.each |key, value| %> <a href="http://www.elwatan.com<%= "#{key}" %>" target='_blank'>download file</a> <% end %> <% end %>
this long post.
first off, should place scraping code in libary, create file lib/watan_scraper.rb
, fill with
module watanscraper def self.get_all_pdfs agent = get_agent # pdf link watan = [] agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each |link| watan << link.text.strip end watan end def self.get_single_pdf(link_text) agent = get_agent # pdf link found_link= nil agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each |link| if link.text.strip = link_text found_link = link['href'] end end pdf = if found_link # fetch pdf agent.get(found_link) end end private def get_agent agent = mechanize.new agent.user_agent_alias = 'mac safari' page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php") # login form = page.form_with(:id => 'form-login-page') form.login = "my_login" form.password = "my_password" form.submit agent end end
ok, , can write in controller
class pdfscontroller < applicationcontroller def index @watan = watanscraper.get_all_pdfs end def show pdf_name = params[:id] @pdf = watanscraper.get_pdf(pdf_name) send_data @pdf, :filename => "#{padf_name}.pdf" end end
your view should in file views/pdfs/index.html.haml
(let's use haml
- @watan.each |link_text| = link_to "download #{link_text}", pdf_path(link_text)
your routes should follows (config/routes.rb
)
resources :pdfs, only: [:index, :show]
this code of course untested, @ least nicely structured , fetch pdf in right session (using mechanize) , sends browser.
Comments
Post a Comment