Rails, Scraping from dynamic URL -


at basic wanting scrape website , render parts of code h1s or something. have used nokogiri , mechanize in past , familiar basics of scraping. in past structure thor task, this

class scrape < thor desc "cl_redding","scrape craigslist rentals" def cl_redding      require file.expand_path('config/environment.rb')      require 'rubygems'      require 'nokogiri'      require 'open-uri'      require 'mechanize'      require 'yaml'      require 'aws-sdk'      require 'csv'      require 'json'      agent = mechanize.new      page = agent.get('http://redding.craigslist.org/search/apa?zoomtoposting=&catabb=apa&query=&minask=&maxask=&bedrooms=&housing_type=&haspic=1&excats=') 

all cool , works, though scrapes craigslist , because called through page =, asking is, have advice on how scrape site called input box on website? specific help, tutorials, advice or resources welcome.

i think question bit generic.

  • you need start rails app
  • build form accept input of url scrape - possibly implement page model store pages scrape
  • parse url way in example
  • possibly use end processing tool sidekiq avoid scraping on front end
  • store results , display them on page#show

Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -