python - Get xpath() to return empty values -


i have situation have lot of <b> tags:

<b>12</b> <b>13</b> <b>14</b> <b></b> <b>121</b> 

as can see, second last tag empty. when call:

sel.xpath('b/text()').extract() 

which gives me:

['12', '13', '14', '121'] 

i have:

['12', '13', '14', '', '121'] 

is there way empty value?


my current work around call:

sel.xpath('b').extract() 

and parsing through each html tag myself (the empty tags here, want).

this okay manually strip tags , text. can use remove_tags() function provided w3lib:

>>> w3lib.html import remove_tags >>> map(remove_tags, sel.xpath('//b').extract()) [u'12', u'13', u'14', u'', u'121'] 

note w3lib scrapy dependency , used internally. no need install separately.

also, better use scrapy input , output processors here. continue using sel.xpath('b') , define input processor. example, can define specific fields item class:

from scrapy.contrib.loader.processor import mapcompose scrapy.item import item, field w3lib.html import remove_tags  class myitem(item):     my_field = field(input_processor=mapcompose(remove_tags))  

Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -