How to prevent Data loss in Pandas.to_excel when handling very long string of numbers

Question

This is my input file (csv)

id1,id2
233924749247492472,9284372492472497294749
298347230474308444,9472943274947429427477

I want to read this file in a dataframe, remove the delimiter and then write it back in .xlsx file

Few code combinations that I have already tried

Attempt 1:

df2 = pd.read_csv(path,  sep=Delimiter, float_precision=None )
pd.options.display.float_format = '{:.1f}'.format
df2.to_excel(filepath, index=False)

Attempt 2:

df2 = pd.read_csv(path, sep=delimiter)
writer = pd.ExcelWriter(path, engine=None)
df3.to_excel(writer, index=False)

Attempt 3:

df2 = pd.read_csv(path, sep=delimiter)
df3.to_excel(path, index=False)

Everytime I am getting the same output in excel file

I am seeing a data loss in the first column. The output looks like this:

id1	id2
233924749247493000	9284372492472497294749
298347230474309000	9472943274947429427477

Asked By: Lal Ansari

||

Source

Answer 1

you can specify the data type of the first column as a string (instead of the default float) when reading in the CSV file.

specifies the data type of both columns as string when reading in the CSV file. This should prevent any automatic conversion to scientific notation, and should preserve the full values.

import pandas as pd

# read the CSV file into a pandas dataframe, specifying data types
df = pd.read_csv('input_file.csv', dtype={'id1': str, 'id2': str})

# remove the delimiter (assuming the delimiter is a comma)
df = df.replace(',', '', regex=True)

# write the modified dataframe to an Excel file
df.to_excel('output_file.xlsx', index=False)

Answered By: Mohit

Answer 2

By default, pandas will cast integer as int64. This is enough for integer between -2⁶³ and 2⁶³-1 = 9223372036854775807. So if any element in a column exceeds this value, pandas will set the column type to object.

Apparently, Excel truncates big int (smaller than 2⁶³-1) but not objects. So a solution would be to set the dtypes of all your columns to objects:

pd.read_csv('input.csv', dtype=object).to_excel('output.xlsx')

Answered By: Tranbi

How to prevent Data loss in Pandas.to_excel when handling very long string of numbers

Question:

Answers: