Multiply a column deppending on the value of other column
Question:
I have a Dataframe with a "Weather" column, and other column that has the "eta".
What I want to do is basically multiply the eta time by a random number, and the range of that number depends on the climate.
The pseudocode looks like this:
If(Climate == 'Sunny') then 'eta' = 'eta' * Random(0.8*1.0)
else if (Climate == 'Rainny') then 'eta' = 'eta' * Random(1.0*1.2)
else if (Climate == 'Cloudy') then 'eta' = 'eta' * Random(0.9*1.1)
I dont know how to achieve this using a Pandas DataFrame, my best aproximation was this but didnt work.
df.loc[df['Climate'] == 'Rain', 'eta' * random.uniform(1.0, 1.2)]
I expected it to multiply the eta column by a rand value between 1.0-1.2 if the value of the ‘eta’ column was ‘Rain’
Answers:
You might want to use:
min_max = {'Sunny': (0.8, 1), 'Rainy': (1, 1.2)}
df['eta'] = (df.groupby('Climate')['eta']
.apply(lambda x: x*np.random.uniform(*min_max[x.name], size=len(x)))
)
Example (as new column for clarity):
Climate eta new_eta
0 Sunny 3.258367 3.026513
1 Sunny 5.615873 4.962923
2 Sunny 4.046182 3.761648
3 Sunny 0.367640 0.296795
4 Sunny 2.875452 2.677827
5 Rainy 3.576453 3.856957
6 Rainy 5.674834 5.895780
7 Rainy 7.876974 8.576879
8 Rainy 8.098803 9.473710
9 Rainy 0.750729 0.841462
For a vectorial approach, using numpy:
min_max = {'Sunny': (0.8, 1), 'Rainy': (1, 1.2)}
low, up = (pd.DataFrame(min_max, index=['min', 'max'])
.reindex(columns=df['Climate']).to_numpy()
)
a = np.random.random(size=len(df))
df['eta'] *= a*(up-low)+low
I have a Dataframe with a "Weather" column, and other column that has the "eta".
What I want to do is basically multiply the eta time by a random number, and the range of that number depends on the climate.
The pseudocode looks like this:
If(Climate == 'Sunny') then 'eta' = 'eta' * Random(0.8*1.0)
else if (Climate == 'Rainny') then 'eta' = 'eta' * Random(1.0*1.2)
else if (Climate == 'Cloudy') then 'eta' = 'eta' * Random(0.9*1.1)
I dont know how to achieve this using a Pandas DataFrame, my best aproximation was this but didnt work.
df.loc[df['Climate'] == 'Rain', 'eta' * random.uniform(1.0, 1.2)]
I expected it to multiply the eta column by a rand value between 1.0-1.2 if the value of the ‘eta’ column was ‘Rain’
You might want to use:
min_max = {'Sunny': (0.8, 1), 'Rainy': (1, 1.2)}
df['eta'] = (df.groupby('Climate')['eta']
.apply(lambda x: x*np.random.uniform(*min_max[x.name], size=len(x)))
)
Example (as new column for clarity):
Climate eta new_eta
0 Sunny 3.258367 3.026513
1 Sunny 5.615873 4.962923
2 Sunny 4.046182 3.761648
3 Sunny 0.367640 0.296795
4 Sunny 2.875452 2.677827
5 Rainy 3.576453 3.856957
6 Rainy 5.674834 5.895780
7 Rainy 7.876974 8.576879
8 Rainy 8.098803 9.473710
9 Rainy 0.750729 0.841462
For a vectorial approach, using numpy:
min_max = {'Sunny': (0.8, 1), 'Rainy': (1, 1.2)}
low, up = (pd.DataFrame(min_max, index=['min', 'max'])
.reindex(columns=df['Climate']).to_numpy()
)
a = np.random.random(size=len(df))
df['eta'] *= a*(up-low)+low