ISO-8859-1 response encoding
Question:
I want to parse this webpage which contains symbols and Cyrillic letters. GET request return response with wrong encoding, which can not show these letters. What you recommend to prevent this response
import requests
url = "http://www.cawater-info.net/karadarya/1991/veg1991.htm"
response = requests.get(url)
print(response.encoding)
print(response.text[:100])
I tried to encode this text, but it did not help
print(response.text.encode('utf-8')[:100])
print(response.text.encode('cp852')[:100])
Answers:
Since the response contains some cyrillic alphabet, you need cp1251 to decode the content :
print(response.content.decode("cp1251")[:100]) # or windows-1251
#<HTML><HEAD><TITLE>Оперативные данные по водозаборам бассейна реки Карадарья на период вегетации 199
I want to parse this webpage which contains symbols and Cyrillic letters. GET request return response with wrong encoding, which can not show these letters. What you recommend to prevent this response
import requests
url = "http://www.cawater-info.net/karadarya/1991/veg1991.htm"
response = requests.get(url)
print(response.encoding)
print(response.text[:100])
I tried to encode this text, but it did not help
print(response.text.encode('utf-8')[:100])
print(response.text.encode('cp852')[:100])
Since the response contains some cyrillic alphabet, you need cp1251 to decode the content :
print(response.content.decode("cp1251")[:100]) # or windows-1251
#<HTML><HEAD><TITLE>Оперативные данные по водозаборам бассейна реки Карадарья на период вегетации 199