Pandas read_csv, reading a boolean with missing values specified as an int
Question:
I am trying to import a csv into a pandas dataframe. I have boolean variables denoted with 1’s and 0’s, where missing values are identified with a -9.
When I try to specify the dtype as boolean, I get a host of different errors, depending on what I try.
Sample data: test.csv
var1, var2
0, 0
0, 1
1, 3
-9, 0
0, 2
1, 7
I try to specify the dtype as I import:
dtype_dict = {'var1':'bool','var2':'int'}
nan_dict = {'var1':[-9]}
foo = pd.read_csv('test.csv',dtype=dtype_dict, na_values=nan_dict)
I get the following error:
ValueError: cannot safely convert passed user dtype of |b1 for int64
dtyped data in column 0
I have also tried specifying the true and false values,
foo = pd.read_csv('test.csv',dtype=dtype_dict,na_values=nan_dict,
true_values=[1],false_values=[0])
but then I get a different error:
Exception: Must be all encoded bytes
The source code for the error says something about catching the occasional none, but nones or nulls are exactly what I want.
Answers:
Can you do something like this?
df=pd.read_csv("test.csv",names=["var1","var2"])
df.ix[df.var1==0,'var1Bool']=False
df.ix[df.var1==1,'var1Bool']=True
Thi should create you a new column and if you are satisfied you can just copy over the old one.
var1 var2 var1Bool
0 0 0 False
1 0 1 False
2 1 3 True
3 -9 0 NaN
4 0 2 False
5 1 7 True
The error Must be all encoded bytes
occurs because the parser is expecting strings, not numbers as values.
Your true/false values should be specified like this:
foo = pd.read_csv('test.csv',dtype=dtype_dict,na_values=nan_dict,
true_values=['1'],false_values=['0'])
I am trying to import a csv into a pandas dataframe. I have boolean variables denoted with 1’s and 0’s, where missing values are identified with a -9.
When I try to specify the dtype as boolean, I get a host of different errors, depending on what I try.
Sample data: test.csv
var1, var2
0, 0
0, 1
1, 3
-9, 0
0, 2
1, 7
I try to specify the dtype as I import:
dtype_dict = {'var1':'bool','var2':'int'}
nan_dict = {'var1':[-9]}
foo = pd.read_csv('test.csv',dtype=dtype_dict, na_values=nan_dict)
I get the following error:
ValueError: cannot safely convert passed user dtype of |b1 for int64
dtyped data in column 0
I have also tried specifying the true and false values,
foo = pd.read_csv('test.csv',dtype=dtype_dict,na_values=nan_dict,
true_values=[1],false_values=[0])
but then I get a different error:
Exception: Must be all encoded bytes
The source code for the error says something about catching the occasional none, but nones or nulls are exactly what I want.
Can you do something like this?
df=pd.read_csv("test.csv",names=["var1","var2"])
df.ix[df.var1==0,'var1Bool']=False
df.ix[df.var1==1,'var1Bool']=True
Thi should create you a new column and if you are satisfied you can just copy over the old one.
var1 var2 var1Bool
0 0 0 False
1 0 1 False
2 1 3 True
3 -9 0 NaN
4 0 2 False
5 1 7 True
The error Must be all encoded bytes
occurs because the parser is expecting strings, not numbers as values.
Your true/false values should be specified like this:
foo = pd.read_csv('test.csv',dtype=dtype_dict,na_values=nan_dict,
true_values=['1'],false_values=['0'])