Insert a new column in pandas with random string values

Question:

I had a DataFrame

     A B C
   0 1 2 3  
   1 2 3 3  
   2 3 2 1  

I needed to create a new column in a pandas DataFrame with ‘yes’ or ‘no’ randomly filling this column.

     A B C  NEW
   0 1 2 3  yes
   1 2 3 3  no
   2 3 2 1  no

Using random.choice results in a column with the same result for every line

     A B C  NEW
   0 1 2 3  no
   1 2 3 3  no
   2 3 2 1  no

I tried map, apply and applymap but there’s a easier way to do.

Asked By: A Neto

||

Answers:

You must set the new column to pd.Series then use random.choices:

import random

df['NEW'] = pd.Series(
    random.choices(['yes', 'no'], weights=[1, 1], k=len(df)), 
    index=df.index
)

random.choices will pick up one of this values for every line.

weights sets probabilities for pickin ‘yes’ or ‘no’, respectively. If you desire a higher chance for ‘yes’ i.e, you must increase the first number.

k sets the length of the Series. It must have the same length of DataFrame.

index is important to set as the same as df.index otherwise it can fill with NaN whether you have sliced it from a bigger DataFrame

Answered By: A Neto