Categorizing a pandas column by another column

Question:

My data looks like this:
screenshot of data

I’m using the following script to populate the RP8_Recruise as either "Y" (NEAR_DIST< 100 meters) or "N" (NEAR_DIST> 100 meters).

nrows = plots_dist_joined.shape[0]

for i in range(0, nrows):
    
    # for plots that are within wanted distance from disturbance harvest 
    if (plots_dist_joined.iloc[i,9] < 100) | (plots_dist_joined.iloc[i,9] == 100):
        plots_dist_joined["RP_"+reporting_period+"Recruise"] = "Y"
        plots_dist_joined["RP_"+reporting_period+"RecrType"] = "PD"
    
    # for plots that are NOT within wanted distance from disturbance harvest 
    else:
        plots_dist_joined["RP_"+reporting_period+"Recruise"] = "N"
        plots_dist_joined["RP_"+reporting_period+"RecrType"] = np.nan

This populates the entire RP_8Recruise column as "N" even though there are distances that are under 100 meters (IDs = 59197, 40, 84, 92, 132). I’m not sure what is wrong in the code.

Asked By: Gloria Desanker

||

Answers:

The problem with your code is that in each iteration, a new value is being assigned to the entire RP_8Recruise and RP_8RecrType columns. The final values of these columns are being decided by the df.NEAR_DIST value in the last row.

Instead of a for-loop use vectorized numpy.where() method to fill in values

# a mask that checks if it's near
is_near = df.NEAR_DIST <= 100
# if near, Y, else N
plots_dist_joined["RP_8Recruise"] = np.where(is_near, "Y", "N")
# if near, PD, else NaN
plots_dist_joined["RP_8RecrType"] = np.where(is_near, "PD", np.nan)
Answered By: not a robot