np.unique after np.round unrounds the data

Question:

This code snippet describes a problem I have been having. For some reason rounded_data seems to be rounded, but once passed in np.unique and np.column_stack the result_array seems to be unrounded, meanwhile the rounded_data is still rounded.

rounded_data = data_with_target_label.round(decimals=2)

unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))

print(rounded_data)
print(result_array)

Result:

443392    0.01
443393    0.00
443394    0.00
443395    0.00
443396    0.11
          ... 
452237    0.04
452238    0.00
452239    0.00
452240    0.00
452241    0.00
Name: values, Length: 8850, dtype: float32
[[0.00000000e+00 4.80000000e+01]
 [9.99999978e-03 2.10000000e+01]
 [1.99999996e-02 1.10000000e+01]
 ...
 [3.29000015e+01 1.00000000e+00]
 [3.94099998e+01 1.00000000e+00]
Asked By: Pinguiz

||

Answers:

this is because your dataframe is in float32 while default number format in numpy is float64. So the number that is rounded in float32 won’t be visibly rounded in float64, because number representation is a bit different.
Solution is to convert either input array to float64 or the result_array into float 32.

Solution 1

Converting numpy array to float32:

rounded_data = data_with_target_label.round(decimals=2)

unique_values, counts = np.unique(rounded_data, return_counts=True)
result_array = np.column_stack((unique_values, counts))

result_array = np.float32(result_array)

Solution 2

Converting input data. For example input is pd.DataFrame (or pd.Series):

df = pd.DataFrame({'vals': np.array([0.013242, 
                                     3.94099998, 
                                     9.99999978, 
                                     0.03234, 
                                     0.05532, 
                                     33.22, 
                                     33.44, 
                                     55.66])}, dtype = 'float32')


rounded_data = df['vals'].astype('float64').round(decimals=2)

unique_values, counts = np.unique(rounded_data, return_counts=True)

result_array = np.column_stack((unique_values, counts))
Answered By: Johnny Cheesecutter
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.