python - Get xpath() to return empty values -
i have situation have lot of <b>
tags:
<b>12</b> <b>13</b> <b>14</b> <b></b> <b>121</b>
as can see, second last tag empty. when call:
sel.xpath('b/text()').extract()
which gives me:
['12', '13', '14', '121']
i have:
['12', '13', '14', '', '121']
is there way empty value?
my current work around call:
sel.xpath('b').extract()
and parsing through each html tag myself (the empty tags here, want).
this okay manually strip tags , text. can use remove_tags()
function provided w3lib
:
>>> w3lib.html import remove_tags >>> map(remove_tags, sel.xpath('//b').extract()) [u'12', u'13', u'14', u'', u'121']
note w3lib
scrapy dependency , used internally. no need install separately.
also, better use scrapy
input , output processors here. continue using sel.xpath('b')
, define input processor. example, can define specific field
s item
class:
from scrapy.contrib.loader.processor import mapcompose scrapy.item import item, field w3lib.html import remove_tags class myitem(item): my_field = field(input_processor=mapcompose(remove_tags))
Comments
Post a Comment