python - Get xpath() to return empty values -

August 15, 2015

i have situation have lot of <b> tags:

<b>12</b> <b>13</b> <b>14</b> <b></b> <b>121</b>

as can see, second last tag empty. when call:

sel.xpath('b/text()').extract()

which gives me:

['12', '13', '14', '121']

i have:

['12', '13', '14', '', '121']

is there way empty value?

my current work around call:

sel.xpath('b').extract()

and parsing through each html tag myself (the empty tags here, want).

this okay manually strip tags , text. can use remove_tags() function provided w3lib:

>>> w3lib.html import remove_tags >>> map(remove_tags, sel.xpath('//b').extract()) [u'12', u'13', u'14', u'', u'121']

note w3lib scrapy dependency , used internally. no need install separately.

also, better use scrapy input , output processors here. continue using sel.xpath('b') , define input processor. example, can define specific fields item class:

from scrapy.contrib.loader.processor import mapcompose scrapy.item import item, field w3lib.html import remove_tags  class myitem(item):     my_field = field(input_processor=mapcompose(remove_tags))

Search This Blog

My

python - Get xpath() to return empty values -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Why am I getting Internal .NET Framework Data Provider error 1025 when passing Method to where? -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -