How to prevent Data loss in Pandas.to_excel when handling very long string of numbers

Question:

This is my input file (csv)

id1,id2
233924749247492472,9284372492472497294749
298347230474308444,9472943274947429427477

I want to read this file in a dataframe, remove the delimiter and then write it back in .xlsx file

Few code combinations that I have already tried

Attempt 1:

df2 = pd.read_csv(path,  sep=Delimiter, float_precision=None )
pd.options.display.float_format = '{:.1f}'.format
df2.to_excel(filepath, index=False)

Attempt 2:

df2 = pd.read_csv(path, sep=delimiter)
writer = pd.ExcelWriter(path, engine=None)
df3.to_excel(writer, index=False)

Attempt 3:

df2 = pd.read_csv(path, sep=delimiter)
df3.to_excel(path, index=False)

Everytime I am getting the same output in excel file

I am seeing a data loss in the first column. The output looks like this:

id1 id2
233924749247493000 9284372492472497294749
298347230474309000 9472943274947429427477
Asked By: Lal Ansari

||

Answers:

you can specify the data type of the first column as a string (instead of the default float) when reading in the CSV file.

specifies the data type of both columns as string when reading in the CSV file. This should prevent any automatic conversion to scientific notation, and should preserve the full values.

import pandas as pd

# read the CSV file into a pandas dataframe, specifying data types
df = pd.read_csv('input_file.csv', dtype={'id1': str, 'id2': str})

# remove the delimiter (assuming the delimiter is a comma)
df = df.replace(',', '', regex=True)

# write the modified dataframe to an Excel file
df.to_excel('output_file.xlsx', index=False)
Answered By: Mohit

By default, pandas will cast integer as int64. This is enough for integer between -2⁶³ and 2⁶³-1 = 9223372036854775807. So if any element in a column exceeds this value, pandas will set the column type to object.

Apparently, Excel truncates big int (smaller than 2⁶³-1) but not objects. So a solution would be to set the dtypes of all your columns to objects:

pd.read_csv('input.csv', dtype=object).to_excel('output.xlsx')
Answered By: Tranbi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.