How to decode strange symbols from parser (bs4) into Cyrillic?

Question:

I tried to import ‘lxml’ and to find what encoding this is but for no success. Websites with decoding functions can’t transfer it back to Cyrillic. Only Windows-1250 and ISO-8859-1 can encode SOME symbols in the text.

import os 
import requests 
from bs4 import BeautifulSoup

gismeteo = 'https://www.gismeteo.ua/ua/weather-novomoskovsk-10961/weekly/'

headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0' }
req1 = requests.get(gismeteo, headers=headers)

data1 = BeautifulSoup(req1.text, 'html.parser')

day_a1 = data1.find('div', class_='widget-row widget-row-days-date')
day_b1 = str([da1.text.replace('n', '').strip() for da1 in day_a1])

print(day_b1)

Sometimes output is like this (good):

['Нд11 вер', 'Пн12', 'Вт13', 'Ср14', 'Чт15', 'Пт16', 'Сб17'] 

And sometimes it like this:

['Ðx9dд11 веÑx80', 'Ðx9fн12', 'Ðx92Ñx8213', 'СÑx8014', 'ЧÑx8215', 'Ðx9fÑx8216', 'Сб17']
Asked By: Dmytro Raiko

||

Answers:

I don’t really know why requests sometimes fails to use the right encoding (I got it right on the first run, and wrong afterwards…) but you can set it manually before accessing the text:

req1.encoding = 'utf8'
data1 = BeautifulSoup(req1.text, 'html.parser')

and this gives you reliably:

['Нд11 вер', 'Пн12', 'Вт13', 'Ср14', 'Чт15', 'Пт16', 'Сб17']
Answered By: Thierry Lathuille