html - Python error trying to parse webpage -

March 15, 2014

from urllib.request import urlopen bs4 import beautifulsoup html = urlopen("http://www.animeplus.tv/anime-show-list/") content =(html.read()) soup = beautifulsoup(content) print(soup.prettify())

the script works fine other webpages, run program targeted website get.

<meta .$_server["request_uri"]."'"="" content="0;url='" http-equiv="refresh"/>

i not understand html code.

i assume it's sort of redirect or way prevent web scrapping.

is there way python access code after redirect or in way browser return source code?

thank you!

the trick here page redirects , sets cookie header important, without not html see in browser.

here's solution using requests - opening same page in same session:

import requests bs4 import beautifulsoup  url = "http://www.animeplus.tv/anime-show-list/" session = requests.session() session.get(url) response = session.get(url)  # open page second time soup = beautifulsoup(response.content) print(soup.title.text)  # prints: "watch anime | anime online | free anime | english anime | watch anime online - animeplus.tv"

alternatively, can use mechanize, doesn't support python 3 @ moment. here's how works:

>>> import mechanize >>> browser = mechanize.browser() >>> browser.open('http://www.animeplus.tv/anime-show-list/') >>> print browser.response().read() <!doctype html> <html> <head>   <title>watch anime | anime online | free anime | english anime | watch anime online - animeplus.tv</title>  ...

Search This Blog

My

html - Python error trying to parse webpage -

Comments

Post a Comment

Popular posts from this blog

rdbms - what exactly the undo information lives in oracle? -

bash - How do you programmatically add a bats test? -

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -