Pandas convert dummies to a new column

Question:

I have a dataframe that discretize the customers into different Q’s, which looks like:

    CustomerID_num  Q1  Q2  Q3  Q4  Q5  Country
0   12346           1   0   0   0   0   United Kingdom
2   12347           0   0   0   0   1   Iceland
9   12348           0   1   0   0   0   Finland
13  12349           0   0   0   0   1   Italy
14  12350           0   1   0   0   0   Norway

What I want to do is adding a new column, Q, to the dataframe which shows which sector this customer is in, so it looks like:

    CustomerID_num  Q1  Q2  Q3  Q4  Q5  Q    Country
0   12346           1   0   0   0   0   1    United Kingdom
2   12347           0   0   0   0   1   5    Iceland
9   12348           0   1   0   0   0   2    Finland
13  12349           0   0   0   0   1   5    Italy
14  12350           0   1   0   0   0   2    Norway

The only way I can think about is using for loop but it will give me a mess. Any other way to do this?

Asked By: WilliamL

||

Answers:

df.loc[df["Q1"] == 1, "Q"] = 1
df.loc[df["Q2"] == 1, "Q"] = 2
df.loc[df["Q3"] == 1, "Q"] = 3
df.loc[df["Q4"] == 1, "Q"] = 4
df.loc[df["Q5"] == 1, "Q"] = 5

This is a possible solution using loc from pandas here is the documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

When the condition is true loc set the value for the entire column named "Q"

Answered By: Diego Triviño

One option is to dump down into numpy:

Filter for just the Q columns:

cols = df.filter(like = 'Q')

Get the column positions that are equal to 1:

_, positions = cols.to_numpy().nonzero()
df.assign(Q = positions + 1)
    CustomerID_num  Q1  Q2  Q3  Q4  Q5         Country  Q
0            12346   1   0   0   0   0  United Kingdom  1
2            12347   0   0   0   0   1         Iceland  5
9            12348   0   1   0   0   0         Finland  2
13           12349   0   0   0   0   1           Italy  5
14           12350   0   1   0   0   0          Norway  2
Answered By: sammywemmy

Here are a few other options

d = df.loc[:,lambda x: x.columns.str.startswith('Q')]

Option 1:

np.where(d)[-1]+1

Option 2:

np.argmax(d.to_numpy(),axis=1)+1

Option 3:

d.set_axis(range(d.shape[1]),axis=1).idxmax(axis=1)+1
Answered By: rhug123