How to prevent "pandas.read_csv" convert index column to float with arg 'dtype=np.float32'?
Question:
I have a CSV file to be read by pandas, and it has the form as following:
name, quart2c, p_rat, other_col
avg, 1, 2, 3
std, 1, 2, 3
I want to pandas.read_csv()
guarantee that all cells have the type of float32, except the first column(‘name’) because that is the index column.
Hence I pass two args to it like this:
pandas.read_csv(file_path, index_col=0, dtype=np.float32)
# or like this, both failed
pandas.read_csv(file_path, index_col='name', dtype=np.float32)
But pandas still tries to convert the first column to float, and raises a exception:
ValueError: could not convert string to float: ‘avg’
What I want:
- The CSV file is made by another program coded by myself. If the structure is wrong, I can adjust it easily.
- I want to always specify the arg
dtype=np.float32
, so as to check whether there are any error values. I don’t want the values be interpreted to integer type also.
- The index column "name" should be reserved as
index_col
, since it will be used later. This column should NOT be cut off anyway.
How should I get it?
Answers:
you can try this way with dtype
and converters
.
import pandas as pd
df = pd.read_csv('test.csv', dtype = 'float32', converters = {'name': str},index_col='name')
print(df)
Output:
quart2c p_rat other_col
name
avg 1.0 2.0 3.0
std 1.0 2.0 3.0
Best is to first read in the csv with default args, giving index col, and then convert the entire df (which will not affect the index):
pd.read_csv(file_path, index_col='name').astype(float)
I have a CSV file to be read by pandas, and it has the form as following:
name, quart2c, p_rat, other_col
avg, 1, 2, 3
std, 1, 2, 3
I want to pandas.read_csv()
guarantee that all cells have the type of float32, except the first column(‘name’) because that is the index column.
Hence I pass two args to it like this:
pandas.read_csv(file_path, index_col=0, dtype=np.float32)
# or like this, both failed
pandas.read_csv(file_path, index_col='name', dtype=np.float32)
But pandas still tries to convert the first column to float, and raises a exception:
ValueError: could not convert string to float: ‘avg’
What I want:
- The CSV file is made by another program coded by myself. If the structure is wrong, I can adjust it easily.
- I want to always specify the arg
dtype=np.float32
, so as to check whether there are any error values. I don’t want the values be interpreted to integer type also. - The index column "name" should be reserved as
index_col
, since it will be used later. This column should NOT be cut off anyway.
How should I get it?
you can try this way with dtype
and converters
.
import pandas as pd
df = pd.read_csv('test.csv', dtype = 'float32', converters = {'name': str},index_col='name')
print(df)
Output:
quart2c p_rat other_col
name
avg 1.0 2.0 3.0
std 1.0 2.0 3.0
Best is to first read in the csv with default args, giving index col, and then convert the entire df (which will not affect the index):
pd.read_csv(file_path, index_col='name').astype(float)