np.unique after np.round unrounds the data
Question:
This code snippet describes a problem I have been having. For some reason rounded_data
seems to be rounded, but once passed in np.unique
and np.column_stack
the result_array
seems to be unrounded, meanwhile the rounded_data
is still rounded.
rounded_data = data_with_target_label.round(decimals=2)
unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))
print(rounded_data)
print(result_array)
Result:
443392 0.01
443393 0.00
443394 0.00
443395 0.00
443396 0.11
...
452237 0.04
452238 0.00
452239 0.00
452240 0.00
452241 0.00
Name: values, Length: 8850, dtype: float32
[[0.00000000e+00 4.80000000e+01]
[9.99999978e-03 2.10000000e+01]
[1.99999996e-02 1.10000000e+01]
...
[3.29000015e+01 1.00000000e+00]
[3.94099998e+01 1.00000000e+00]
Answers:
this is because your dataframe is in float32
while default number format in numpy is float64
. So the number that is rounded in float32 won’t be visibly rounded in float64, because number representation is a bit different.
Solution is to convert either input array to float64 or the result_array into float 32.
Solution 1
Converting numpy array to float32:
rounded_data = data_with_target_label.round(decimals=2)
unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))
result_array = np.float32(result_array)
Solution 2
Converting input data. For example input is pd.DataFrame (or pd.Series):
df = pd.DataFrame({'vals': np.array([0.013242,
3.94099998,
9.99999978,
0.03234,
0.05532,
33.22,
33.44,
55.66])}, dtype = 'float32')
rounded_data = df['vals'].astype('float64').round(decimals=2)
unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))
This code snippet describes a problem I have been having. For some reason rounded_data
seems to be rounded, but once passed in np.unique
and np.column_stack
the result_array
seems to be unrounded, meanwhile the rounded_data
is still rounded.
rounded_data = data_with_target_label.round(decimals=2)
unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))
print(rounded_data)
print(result_array)
Result:
443392 0.01
443393 0.00
443394 0.00
443395 0.00
443396 0.11
...
452237 0.04
452238 0.00
452239 0.00
452240 0.00
452241 0.00
Name: values, Length: 8850, dtype: float32
[[0.00000000e+00 4.80000000e+01]
[9.99999978e-03 2.10000000e+01]
[1.99999996e-02 1.10000000e+01]
...
[3.29000015e+01 1.00000000e+00]
[3.94099998e+01 1.00000000e+00]
this is because your dataframe is in float32
while default number format in numpy is float64
. So the number that is rounded in float32 won’t be visibly rounded in float64, because number representation is a bit different.
Solution is to convert either input array to float64 or the result_array into float 32.
Solution 1
Converting numpy array to float32:
rounded_data = data_with_target_label.round(decimals=2)
unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))
result_array = np.float32(result_array)
Solution 2
Converting input data. For example input is pd.DataFrame (or pd.Series):
df = pd.DataFrame({'vals': np.array([0.013242,
3.94099998,
9.99999978,
0.03234,
0.05532,
33.22,
33.44,
55.66])}, dtype = 'float32')
rounded_data = df['vals'].astype('float64').round(decimals=2)
unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))