Python: How to preserve leading zeroes when reading in pandas a #0000000000 formatted excel column

Question:

I have an Excel file with a column with custom formatting #0000000000. If I read it into a pandas data frame with either of these three commands

pd.read_excel("Formatted_File.xlsx", dtype=str)

or

pd.read_excel("Formatted_File.xlsx", dtype="object")

or just

pd.read_excel("Formatted_File.xlsx")

I get a data with cut off zeros.

Without going into details, but let’s assume I cannot change the Custom Formatting of the input Excel file. How can I preserve the leading zeros while reading the file to pandas data frame?

Asked By: gdol

||

Answers:

You can’t read the displayed values as it with Pandas (probably it’s possible with openpyxl) because the value is stored as a number (1, 2, 3, …) but the column has a custom format (#0000000000).

enter image description here

import pandas as pd
df = pd.read_excel('data.xlsx', dtype={'UID': str})
print(df)

# Output
   UID
0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
9   10

You can recreate the column format with str.zfill:

df['UID'] = df['UID'].str.zfill(10)
print(df)

# Output
          UID
0  0000000001
1  0000000002
2  0000000003
3  0000000004
4  0000000005
5  0000000006
6  0000000007
7  0000000008
8  0000000009
9  0000000010
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.