html - Python error trying to parse webpage -


from urllib.request import urlopen bs4 import beautifulsoup html = urlopen("http://www.animeplus.tv/anime-show-list/") content =(html.read()) soup = beautifulsoup(content) print(soup.prettify()) 

the script works fine other webpages, run program targeted website get.

<meta .$_server["request_uri"]."'"="" content="0;url='" http-equiv="refresh"/> 

i not understand html code.

i assume it's sort of redirect or way prevent web scrapping.

is there way python access code after redirect or in way browser return source code?

thank you!

the trick here page redirects , sets cookie header important, without not html see in browser.

here's solution using requests - opening same page in same session:

import requests bs4 import beautifulsoup  url = "http://www.animeplus.tv/anime-show-list/" session = requests.session() session.get(url) response = session.get(url)  # open page second time soup = beautifulsoup(response.content) print(soup.title.text)  # prints: "watch anime | anime online | free anime | english anime | watch anime online - animeplus.tv" 

alternatively, can use mechanize, doesn't support python 3 @ moment. here's how works:

>>> import mechanize >>> browser = mechanize.browser() >>> browser.open('http://www.animeplus.tv/anime-show-list/') >>> print browser.response().read() <!doctype html> <html> <head>   <title>watch anime | anime online | free anime | english anime | watch anime online - animeplus.tv</title>  ... 

Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

jsf - How to ajax update an item in the footer of a PrimeFaces dataTable? -

django - CSRF verification failed. Request aborted. CSRF cookie not set -