Pandas: cannot safely convert passed user dtype of int32 for float64

Question:

I am stumped by a problem with loading my data into a Pandas dataframe using read_table(). The error says TypeError: Cannot cast array from dtype('float64') to dtype('int32') according to the rule 'safe' and ValueError: cannot safely convert passed user dtype of int32 for float64 dtyped data in column 2

test.py:

import numpy as np
import os
import pandas as pd

# put test.csv in same folder as script
mydir = os.path.dirname(os.path.abspath(__file__))
csv_path = os.path.join(mydir, "test.csv")

df = pd.read_table(csv_path, sep=' ',
                   comment='#',
                   header=None,
                   skip_blank_lines=True,
                   names=["A", "B", "C", "D", "E", "F", "G"],
                   dtype={"A": np.int32,
                       "B": np.int32,
                       "C": np.float64,
                       "D": np.float64,
                       "E": np.float64,
                       "F": np.float64,
                       "G": np.int32})

test.csv:

2270433 3 21322.889 11924.667 5228.753 1.0 -1
2270432 3 21322.297 11924.667 5228.605 1.0 2270433

Asked By: crypdick

||

Answers:

The problem was that I was using spaces as the delimiter and that the csv had trailing spaces. Removing the trailing spaces solved the issue.

To trim all of the trailing spaces on every line of every file in a directory, I ran this command: find . -name "*.csv" | xargs sed -i 's/[ t]*$//'

Answered By: crypdick

Column 2 includes other types of symbols, e.g. float instead of int.

I changed the dtype to float instead of integer and it got fixed.

Answered By: jlwu