python - Can't iterate through XPathSelectorList -
i have build simple web-scraper below:
from scrapy.spider import basespider scrapy.selector import htmlxpathselector class ittester(basespider): name = 'ittester' allowed_domains = ["sec.gov"] start_urls = ['http://www.sec.gov/archives/edgar/data/320193/000112760212034445/xslf345x03/form4.xml'] def parse(self, response): hxs = htmlxpathselector(response) sites = hxs.select("/html/body/table[3]/tbody/tr") print len(sites) site in sites: hhh = site.select("/td[1]/span[1]/text()").extract() print hhh
i goes this site , want scrape , print each of instance of "common stock" (that 2 times "common stock").
i have identified rows of table hxs.select("/html/body/table[3]/tbody/tr")
, when printing length of returned xpathselectorlist
prints 2, print in for loop
returns 2 blank brackets []. have used firebug xpath, have checked tbody
in source code.
any ideas doing wrong?
use relative path in inner selection site.select("td[1]/span[1]/text()").extract()
instead of have.
Comments
Post a Comment