Dummies from a string variable

Question:

I am trying to generate dummy variables from a string variable using the syntax below

import pandas as pd

data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'crops': ['[maize]', '[maize, cassava]', '[beans, cassava, potato]', '[beans, potato]', '[beans, cassava, maize, potato]', '[beans]', '[cassava, maize, potato]', '[beans, maize]', '[cassava, maize, potato]', '[cassava]', '[beans, cassava, potato]', '[maize, potato]', '[beans, maize, potato]', '[beans, cassava, maize, potato]', '[potato]', '[cassava, potato]', '[beans]', '[maize]', '[potato]', '[cassava]'],
}

df = pd.DataFrame(data)

df['crops'] = df['crops'].str.replace('[', '')
df['crops'] = df['crops'].str.replace(']', '')

res = df.join(df.pop('crops').str.get_dummies(','))
res

However, some variables seem repeated and I don’t know why.

Asked By: Stephen Okiya

||

Answers:

Just add a space after , in get dummies.

res = df.join(df.pop('crops').str.get_dummies(', '))

If you dont have space ‘ maize’ and ‘maize’ is a different thing so a different column etc.

Answered By: Eirini Kotzia
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.