Multiply Columns Together Based on Condition

Question:

Is there a way for me to dynamically multiply columns together based on a value in another column in Python? I’m using Polars if that makes a difference. For example, if calendar_year is 2018, I’d want to multiply columns 2018, 2019, 2020, and 2021 together, but if calendar_year is 2019, I’d only want to multiply columns 2019, 2020, and 2021 together. I’d like to store the result in a new column called product. In the future, we’ll have additional columns such as 2022, and 2023, so I’d love the ability to have my formula account for these new columns without having to go into the code base each year and add them to my product manually.

id calendar_year 2017 2018 2019 2020 2021 product
123 2018 0.998 0.997 0.996 0.995 0.994 0.9801
456 2019 0.993 0.992 0.991 0.990 0.989 0.9557

Thanks in advance for the help!

Asked By: Max Herring

||

Answers:

I think you could use the np.where(condition,then,else) function to do that.
Do you want to create a new column with the result of that operation? This could work

df['2018_result'] = np.where(df.calendar_year.isin(['2019','2020','2021']),df.2019*df.2020*df.2021, 'add more calculations')

Would be great if you can provide more details 🙂

Answered By: Selknam_CL

It looks like you want to multiply CY factors for all years beyond calendar_year, and not have to update this logic for each year.

If that’s the case, one way to avoid hard-coding the CY selections is to use melt and filter the results.

(
    df
    .select([
        'id',
        'calendar_year',
        pl.col('^20dd$')
    ])
    .melt(
        id_vars=['id', 'calendar_year'],
        variable_name='CY',
        value_name='CY factor',
    )
    .with_column(pl.col('CY').cast(pl.Int64))
    .filter(pl.col('CY') >= pl.col('calendar_year'))
    .groupby('id')
    .agg(
        pl.col('CY factor').product().alias('product')
    )
)
shape: (2, 2)
┌─────┬──────────┐
│ id  ┆ product  │
│ --- ┆ ---      │
│ i64 ┆ f64      │
╞═════╪══════════╡
│ 456 ┆ 0.970298 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 123 ┆ 0.982119 │
└─────┴──────────┘

From there, you can join the result back to your original dataset (using the id column).

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.