python - Can't iterate through XPathSelectorList -

January 15, 2011

i have build simple web-scraper below:

from scrapy.spider import basespider scrapy.selector import htmlxpathselector   class ittester(basespider):     name = 'ittester'     allowed_domains = ["sec.gov"]     start_urls = ['http://www.sec.gov/archives/edgar/data/320193/000112760212034445/xslf345x03/form4.xml']      def parse(self, response):     hxs = htmlxpathselector(response)     sites = hxs.select("/html/body/table[3]/tbody/tr")     print len(sites)     site in sites:         hhh = site.select("/td[1]/span[1]/text()").extract()         print hhh

i goes this site , want scrape , print each of instance of "common stock" (that 2 times "common stock").

i have identified rows of table hxs.select("/html/body/table[3]/tbody/tr") , when printing length of returned xpathselectorlist prints 2, print in for loop returns 2 blank brackets []. have used firebug xpath, have checked tbody in source code.

any ideas doing wrong?

use relative path in inner selection site.select("td[1]/span[1]/text()").extract() instead of have.

Search This Blog

My

python - Can't iterate through XPathSelectorList -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Why am I getting Internal .NET Framework Data Provider error 1025 when passing Method to where? -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -