Rails, Scraping from dynamic URL -
at basic wanting scrape website , render parts of code h1s or something. have used nokogiri , mechanize in past , familiar basics of scraping. in past structure thor task, this
class scrape < thor desc "cl_redding","scrape craigslist rentals" def cl_redding require file.expand_path('config/environment.rb') require 'rubygems' require 'nokogiri' require 'open-uri' require 'mechanize' require 'yaml' require 'aws-sdk' require 'csv' require 'json' agent = mechanize.new page = agent.get('http://redding.craigslist.org/search/apa?zoomtoposting=&catabb=apa&query=&minask=&maxask=&bedrooms=&housing_type=&haspic=1&excats=')
all cool , works, though scrapes craigslist , because called through page =, asking is, have advice on how scrape site called input box on website? specific help, tutorials, advice or resources welcome.
i think question bit generic.
- you need start rails app
- build form accept input of url scrape - possibly implement page model store pages scrape
- parse url way in example
- possibly use end processing tool sidekiq avoid scraping on front end
- store results , display them on page#show
Comments
Post a Comment