html - Web scrapping with beautiful soup 4 python -

March 15, 2013

so started using beautiful soup 4 , came across problem i've been trying solve few days can't. let me first paste html code want analyse:

<table class="table table-condensed table-hover tenlaces tablesorter"> <thead> <tr> <th class="al">language</th> <th class="ac">link</th> </tr> </thead> <tbody>               <tr>             <td class="tdidioma"><span class="flag flag_0">0</span></td>             <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="ver..." href="link want save0"><i class="icon-play"></i>&nbsp;&nbsp;ver</a></td>             </tr>              <tr>             <td class="tdidioma"><span class="flag flag_1">1</span></td>             <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="ver..." href="link want save1"><i class="icon-play"></i>&nbsp;&nbsp;ver</a></td>             </tr>              <tr>             <td class="tdidioma"><span class="flag flag_2">2</span></td>             <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="ver..." href="link want save2"><i class="icon-play"></i>&nbsp;&nbsp;ver</a></td>             </tr> </tbody> </table>

as can see in each < tr > there < td > language , link. problem don't know how relate language link. mean, i'd select example if space in language 1 return link. if not, don't anything. i'm able return < td > language, not < tr > important think don't know if made point because don't know how explain

the code have gets < tbody > main url don't know how make i'm asking.

thanks, , sorry bad english!

edit: here sample of code can see libraries i'm using , everything

from bs4 import beautifulsoup import urllib2  url = raw_input("introduce url analyse: ") page = urllib2.urlopen(url) soup = beautifulsoup(page.read()) body = soup.tbody #here should don't know how page.close()

try this:

result = none row in soup.tbody.find_all('tr'):     lang, link = row.find_all('td')     if lang.string == '1':         result = link.a['href'] print result

Search This Blog

My

html - Web scrapping with beautiful soup 4 python -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Why am I getting Internal .NET Framework Data Provider error 1025 when passing Method to where? -

postgresql - how to get points from linestring postgis -