How to create a conditional column in Python that picks data from different column based on the year?
Question:
My dataset looks something like this:
Product ID
Sales Year
Sales_percentage_2016
Sales_percentage_2017
Sales_percentage_2018
1
2016
5
8
5
2
2017
7
9
6
Output should be something like this:
Product ID
Sales Year
Sales_Percentage
1
2016
5
2
2017
9
So that this was unwanted data can be removed and relevant can be kept.
Answers:
You can use indexing lookup:
# get Year to use for indexing
idx, cols = pd.factorize(df['Sales Year'])
# add prefix
cols = 'Sales_percentage_' + cols.astype(str)
# index
df['Sales'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
# get rid of unnecessary columns
df = df.filter(regex=r'^(?!Sales_percentage_)')
output:
Product ID Sales Year Sales
0 1 2016 5
1 2 2017 9
My dataset looks something like this:
Product ID | Sales Year | Sales_percentage_2016 | Sales_percentage_2017 | Sales_percentage_2018 |
---|---|---|---|---|
1 | 2016 | 5 | 8 | 5 |
2 | 2017 | 7 | 9 | 6 |
Output should be something like this:
Product ID | Sales Year | Sales_Percentage |
---|---|---|
1 | 2016 | 5 |
2 | 2017 | 9 |
So that this was unwanted data can be removed and relevant can be kept.
You can use indexing lookup:
# get Year to use for indexing
idx, cols = pd.factorize(df['Sales Year'])
# add prefix
cols = 'Sales_percentage_' + cols.astype(str)
# index
df['Sales'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
# get rid of unnecessary columns
df = df.filter(regex=r'^(?!Sales_percentage_)')
output:
Product ID Sales Year Sales
0 1 2016 5
1 2 2017 9