python - Get xpath() to return empty values -
i have situation have lot of <b> tags:
<b>12</b> <b>13</b> <b>14</b> <b></b> <b>121</b> as can see, second last tag empty. when call:
sel.xpath('b/text()').extract() which gives me:
['12', '13', '14', '121'] i have:
['12', '13', '14', '', '121'] is there way empty value?
my current work around call:
sel.xpath('b').extract() and parsing through each html tag myself (the empty tags here, want).
this okay manually strip tags , text. can use remove_tags() function provided w3lib:
>>> w3lib.html import remove_tags >>> map(remove_tags, sel.xpath('//b').extract()) [u'12', u'13', u'14', u'', u'121'] note w3lib scrapy dependency , used internally. no need install separately.
also, better use scrapy input , output processors here. continue using sel.xpath('b') , define input processor. example, can define specific fields item class:
from scrapy.contrib.loader.processor import mapcompose scrapy.item import item, field w3lib.html import remove_tags class myitem(item): my_field = field(input_processor=mapcompose(remove_tags))
Comments
Post a Comment