Replace str value in pd df by sampling from a pandas array
Question:
I have a pandas df
df = pd.DataFrame({'A': [0.1, 0.1, 0.1, 0.1, 'X'], 'B': [0.1, 0.1, 'X', 0.1, 0.1], 'C': [0.1, 'X', 'X', 'X', 'X']})
A B C
0.1 0.1 0.1
0.1 0.1 X
0.1 X X
0.1 0.1 X
X 0.1 X
and an array
<PandasArray> [0.9999999999999304, 0.9999973764241584, 0.9999997377248664, 0.9615117313882438, 0.871479832883895, 0.9999999999998652, 0.9999999999999994, 0.9999029359407972, 0.999999984174712, 0.9944689702907784] Length: 10, dtype: float64
I would like to replace the values X by sampling from the array such that the distribution of the values in the array is represented in the df in the locations with the value X
I have tried
df[df == 'X'] = np.random.choice(arr, replace=True)
which gives this output
A B C
0.1 0.1 0.1
0.1 0.1 1.0
0.1 1.0 1.0
0.1 0.1 1.0
1.0 0.1 1.0
Does this randomly sample from the array and why are the values rounded? I would like to replace with the exact values from the array.
Answers:
Does this randomly sample from the array?
Yes, you are right.
Why are the values rounded?
It is display problem, if convert to list get real data:
df[df == 'X'] = np.random.choice(arr, replace=True)
print (df.to_dict('list'))
{'A': [0.1, 0.1, 0.1, 0.1, 0.9999997377248664],
'B': [0.1, 0.1, 0.9999997377248664, 0.1, 0.1],
'C': [0.1, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664]}
I have a pandas df
df = pd.DataFrame({'A': [0.1, 0.1, 0.1, 0.1, 'X'], 'B': [0.1, 0.1, 'X', 0.1, 0.1], 'C': [0.1, 'X', 'X', 'X', 'X']})
A B C
0.1 0.1 0.1
0.1 0.1 X
0.1 X X
0.1 0.1 X
X 0.1 X
and an array
<PandasArray> [0.9999999999999304, 0.9999973764241584, 0.9999997377248664, 0.9615117313882438, 0.871479832883895, 0.9999999999998652, 0.9999999999999994, 0.9999029359407972, 0.999999984174712, 0.9944689702907784] Length: 10, dtype: float64
I would like to replace the values X by sampling from the array such that the distribution of the values in the array is represented in the df in the locations with the value X
I have tried
df[df == 'X'] = np.random.choice(arr, replace=True)
which gives this output
A B C
0.1 0.1 0.1
0.1 0.1 1.0
0.1 1.0 1.0
0.1 0.1 1.0
1.0 0.1 1.0
Does this randomly sample from the array and why are the values rounded? I would like to replace with the exact values from the array.
Does this randomly sample from the array?
Yes, you are right.
Why are the values rounded?
It is display problem, if convert to list get real data:
df[df == 'X'] = np.random.choice(arr, replace=True)
print (df.to_dict('list'))
{'A': [0.1, 0.1, 0.1, 0.1, 0.9999997377248664],
'B': [0.1, 0.1, 0.9999997377248664, 0.1, 0.1],
'C': [0.1, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664]}