The utf-8 encoding does not work correctly

Question:

I am making a programme to visualise reports in HTML. It’s nothing complicated, I read the CSV file added some style(for text and table) some text and read the DF as a table.
This is how code looks like:

df = pd.read_csv("Średnie wyniki oceny użytkowości rozpłodowej loch dla rasy puławskiej.csv",sep=";")
html = f"""
<HTML lang="pl">
<head>
<title>some title</title>
</head>
<style type="text/css">
some style
</style>
<body>
some text
{df.to_html(index=False)}
some text
</body>
</html>
"""
with open('report','w',)as f:
    f.write(html)

The problem is when I try to pd.read_csv. When I try to read CSV I get this error:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 9: invalid continuation byte
I tried to change the encoding in read_csv to: ISO 8859-2 but this give me an error in line f.write(HTML)
the error looks like this:UnicodeEncodeError: 'charmap' codec can't encode character 'x8c' in position 1694: character maps to <undefined>
My next two steps were to change the encoding in ‘with open()’ first I change it to ‘utf-8’. And this worked but my text glitched, later I change the encoding to ‘ISO 8859-2’ but in this case text in the table was glitched.
Help please I am out of ideas.

Asked By: mlodycezar

||

Answers:

First, let’s ensure that you’re reading the CSV file using the correct encoding. You can try with ‘utf-8’ and ‘ISO-8859-2’. It’s crucial to use the correct encoding when reading the CSV file; otherwise, you’ll end up with garbled text.

import pandas as pd

# Try reading the CSV with different encodings, e.g., 'utf-8' or 'ISO-8859-2'.
csv_encoding = 'utf-8'
df = pd.read_csv('your_csv_file.csv', encoding=csv_encoding)

Now use the same encoding as the CSV file when writing the HTML report:

HTML = f"""
<!DOCTYPE html>
<html lang="pl">
<head>
<meta charset="{csv_encoding}">
<title>some title</title>
<style type="text/css">
some style
</style>
</head>
<body>
some text
{df.to_html(index=False)}
some text
</body>
</html>
"""

with open('report.html', 'w', encoding=csv_encoding) as f:
    f.write(HTML)

This code reads the CSV file using the specified encoding (in this case, ‘utf-8’) and writes the HTML report using the same encoding. By using the same encoding throughout your code, you should avoid the errors you were encountering.

If you still have issues, you can try changing the csv_encoding variable to ‘ISO-8859-2’ or another suitable encoding for your CSV data.

Answered By: David Rojo
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.