Python: How to preserve leading zeroes when reading in pandas a #0000000000 formatted excel column
Question:
I have an Excel file with a column with custom formatting #0000000000. If I read it into a pandas data frame with either of these three commands
pd.read_excel("Formatted_File.xlsx", dtype=str)
or
pd.read_excel("Formatted_File.xlsx", dtype="object")
or just
pd.read_excel("Formatted_File.xlsx")
I get a data with cut off zeros.
Without going into details, but let’s assume I cannot change the Custom Formatting of the input Excel file. How can I preserve the leading zeros while reading the file to pandas data frame?
Answers:
You can’t read the displayed values as it with Pandas
(probably it’s possible with openpyxl
) because the value is stored as a number (1, 2, 3, …) but the column has a custom format (#0000000000).
import pandas as pd
df = pd.read_excel('data.xlsx', dtype={'UID': str})
print(df)
# Output
UID
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
You can recreate the column format with str.zfill
:
df['UID'] = df['UID'].str.zfill(10)
print(df)
# Output
UID
0 0000000001
1 0000000002
2 0000000003
3 0000000004
4 0000000005
5 0000000006
6 0000000007
7 0000000008
8 0000000009
9 0000000010
I have an Excel file with a column with custom formatting #0000000000. If I read it into a pandas data frame with either of these three commands
pd.read_excel("Formatted_File.xlsx", dtype=str)
or
pd.read_excel("Formatted_File.xlsx", dtype="object")
or just
pd.read_excel("Formatted_File.xlsx")
I get a data with cut off zeros.
Without going into details, but let’s assume I cannot change the Custom Formatting of the input Excel file. How can I preserve the leading zeros while reading the file to pandas data frame?
You can’t read the displayed values as it with Pandas
(probably it’s possible with openpyxl
) because the value is stored as a number (1, 2, 3, …) but the column has a custom format (#0000000000).
import pandas as pd
df = pd.read_excel('data.xlsx', dtype={'UID': str})
print(df)
# Output
UID
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
You can recreate the column format with str.zfill
:
df['UID'] = df['UID'].str.zfill(10)
print(df)
# Output
UID
0 0000000001
1 0000000002
2 0000000003
3 0000000004
4 0000000005
5 0000000006
6 0000000007
7 0000000008
8 0000000009
9 0000000010