python pandas – dividing column by another column
Question:
I’m trying to add a column to my DataFrame
which is the product of division of two other columns, like so:
df['$/hour'] = df['$']/df['hours']
This works fine, but if the value in ['hours']
is less than 1
, then the ['$/hour']
value is greater than the value in ['$']
, which is not what I want.
Is there a way of controlling the operation so that if ['hours'] < 1
then df['$/hour'] = df['$']
?
Answers:
df['$/hour'] = df.apply(lambda x: x['$'] if x['$'] < 1 else x['$']/x['hours'], axis=1)
You can use numpy.where
:
print df
hours $
0 0 8
1 0 9
2 0 9
3 3 6
4 6 4
5 3 7
6 5 5
7 10 1
8 9 3
9 3 6
10 5 4
11 5 7
df['$/hour'] = np.where(df['hours'] < 1, df['hours'], df['$']/df['hours'])
print df
hours $ $/hour
0 0 8 0.000000
1 0 9 0.000000
2 0 9 0.000000
3 3 6 2.000000
4 6 4 0.666667
5 3 7 2.333333
6 5 5 1.000000
7 10 1 0.100000
8 9 3 0.333333
9 3 6 2.000000
10 5 4 0.800000
11 5 7 1.400000
You can also filter and select the indexes to set with DataFrame.loc
:
df['$/hour'].loc[df['hours']>=1] = df['$']/df['hours']
df['$/hour'].loc[df['hours']<1] = df['$']
You can also use mask
:
df['$/hour'] = (df['$'] / df['hours']).mask(df['hours'] < 1, df['$'])
If the condition df['hours'] < 1
is met the values from column $
are taken, otherwise $
is divided by hours
.
I’m trying to add a column to my DataFrame
which is the product of division of two other columns, like so:
df['$/hour'] = df['$']/df['hours']
This works fine, but if the value in ['hours']
is less than 1
, then the ['$/hour']
value is greater than the value in ['$']
, which is not what I want.
Is there a way of controlling the operation so that if ['hours'] < 1
then df['$/hour'] = df['$']
?
df['$/hour'] = df.apply(lambda x: x['$'] if x['$'] < 1 else x['$']/x['hours'], axis=1)
You can use numpy.where
:
print df
hours $
0 0 8
1 0 9
2 0 9
3 3 6
4 6 4
5 3 7
6 5 5
7 10 1
8 9 3
9 3 6
10 5 4
11 5 7
df['$/hour'] = np.where(df['hours'] < 1, df['hours'], df['$']/df['hours'])
print df
hours $ $/hour
0 0 8 0.000000
1 0 9 0.000000
2 0 9 0.000000
3 3 6 2.000000
4 6 4 0.666667
5 3 7 2.333333
6 5 5 1.000000
7 10 1 0.100000
8 9 3 0.333333
9 3 6 2.000000
10 5 4 0.800000
11 5 7 1.400000
You can also filter and select the indexes to set with DataFrame.loc
:
df['$/hour'].loc[df['hours']>=1] = df['$']/df['hours']
df['$/hour'].loc[df['hours']<1] = df['$']
You can also use mask
:
df['$/hour'] = (df['$'] / df['hours']).mask(df['hours'] < 1, df['$'])
If the condition df['hours'] < 1
is met the values from column $
are taken, otherwise $
is divided by hours
.