encoding

Convert Categorical features to Numerical

Convert Categorical features to Numerical Question: I have a lot of categorical columns and want to convert values in those columns to numerical values so that I will be able to apply ML model. Now by data looks something like below. Column 1- Good/bad/poor/not reported column 2- Red/amber/green column 3- 1/2/3 column 4- Yes/No Now …

Total answers: 2

What is this hexadecimal in the utf16 format?

What is this hexadecimal in the utf16 format? Question: print(bytes(‘ba’, ‘utf-16′)) Result : b’xffxfebx00ax00’ I understand utf-16 means every character will take 16 bits means 00000000 00000000 in binary and i understand there are 16 bits here x00a means x00 = 00000000 and a = 01000001 so both gives x00a it is clear to my …

Total answers: 3

how to decode file when writing in yaml format

how to decode file when writing in yaml format Question: I am trying to write a dictionary file that contains Tibetan language word into yaml format. Problem is i couldn’t encode/decode the file when writing the yaml file. Here is code : with open(‘tibetan_dict.yml’, ‘w’, encoding=’utf-8′) as outfile: yaml.dump(tibetan_dict, outfile, default_flow_style=False) tibetan_dict contains: {‘ཀ་ཅ’: ‘༡.་ནོར་རྫས་ཀྱི་སྤྱི་མིང་སྟེ། …

Total answers: 1

Normalize string from webpage

Normalize string from webpage Question: Trying to normalize the string "PartIIxa0I x96 FINANCIALn INFORMATION". In general, all that should be left (once non utf-8 characters are excluded) are letters, numbers and dots. Therefore the expected output is "PartII FINANCIAL INFORMATION". The text comes from this Sec form. Solutions tried, where text is the string: text.encode(‘utf-8’, …

Total answers: 1

Weird encoding in vtt file–python

Weird encoding in vtt file–python Question: I am trying to obtain text from a subtitles file (vtt format) as follows: import requests r = requests.get(‘https://nogeovod-fy.atresmedia.com/vsg/sitemap/assets4/2022/09/26/C302281D-5C76-4710-A4FB-9AD7252B7F47/es.vtt’) print(r.encoding) r.encoding = r.apparent_encoding print(r.text) Some characters seem to be missed as the original encoding ISO-8859-1 is not the right one. However, when I try to change it to utf-8, …

Total answers: 1

Unicode error when reading a csv file with pandas

Unicode error when reading a csv file with pandas Question: Why pandas is not able to read this csv file and returns ‘UnicodeEncodeError’. I tried lot of solutions from stackoverflow (local download, different encoding, change the engine…), but still not working…How to fix it? import pandas as pd url = ‘http://é.com’ pd.read_csv(url,encoding=’utf-8′) Asked By: SciPy …

Total answers: 1

Python opening files with utf-8 file names

Python opening files with utf-8 file names Question: In my code I used something like file = open(path +’/’+filename, ‘wb’) to write the file but in my attempt to support non-ascii filenames, I encode it as such naming = path+’/’+filename file = open(naming.encode(‘utf-8’, ‘surrogateescape’), ‘wb’) write binary data… so the file is named something like …

Total answers: 2

How to read sys.stdin containing binary data in python (ignore errors)?

How to read sys.stdin containing binary data in python (ignore errors)? Question: How do I read sys.stdin, but ignoring decoding errors? I know that sys.stdin.buffer exists, and I can read the binary data and then decode it with .decode(‘utf8′, errors=’ignore’), but I want to read sys.stdin line by line. Maybe I can somehow reopen the …

Total answers: 2

Writing dbf file with custom encoding (DBF Package)

Writing dbf file with custom encoding (DBF Package) Question: I have some Characters in Farsi and I want to write them to a dbf file with my custom codepage which is 1 byte per character. I think this problem can be solved in one of these two ways: 1- Passing my custom codepage to the …

Total answers: 2