Creating a new column based on condition with values from another column in python
Question:
I have a Dataframe and would like to create a new column based on condition, in this new column if a certain condition is met then the value will be from another column otherwise it needs to be zero.
The Orginal DataFrame is;
df2 = pd.read_csv('C:UsersABC.csv')
df2['Date'] = pd.to_datetime(df2['Date'])
df2['Hour'] = df2.Date.dt.hour
df2['Occupied'] = ''
Date Value Hour Occupied
2016-02-02 21:00:00 0.6 21
2016-02-02 22:00:00 0.4 22
2016-02-02 23:00:00 0.4 23
2016-02-03 00:00:00 0.3 0
2016-02-03 01:00:00 0.2 1
2016-02-03 02:00:00 0.2 2
2016-02-03 03:00:00 0.1 3
2016-02-03 04:00:00 0.2 4
2016-02-03 05:00:00 0.1 5
2016-02-03 06:00:00 0.4 6
I would like to have same values as df2.Value in the Occupied column if df2.Hour is greater than or equal to 9, otherwise the values will be zero in the Occupied column. I have tried the following code but it does not work as I would like to (it prints same values as df2.Value without considering else statement);
for i in df2['Hour']:
if i >= 9:
df2['Occupied'] = df2.Value
else:
df2['Occupied'] = 0
Any idea what is wrong with this?
Answers:
use where
with your boolean condition, this will set all row values rather than iterating row-wise:
In [120]:
df2['Occupied'] = df2['Value'].where(df2['Hour'] >= 9, 0)
df2
Out[120]:
Date Value Hour Occupied
0 2016-02-02 21:00:00 0.6 21 0.6
1 2016-02-02 22:00:00 0.4 22 0.4
2 2016-02-02 23:00:00 0.4 23 0.4
3 2016-02-03 00:00:00 0.3 0 0.0
4 2016-02-03 01:00:00 0.2 1 0.0
5 2016-02-03 02:00:00 0.2 2 0.0
6 2016-02-03 03:00:00 0.1 3 0.0
7 2016-02-03 04:00:00 0.2 4 0.0
8 2016-02-03 05:00:00 0.1 5 0.0
9 2016-02-03 06:00:00 0.4 6 0.0
I have a Dataframe and would like to create a new column based on condition, in this new column if a certain condition is met then the value will be from another column otherwise it needs to be zero.
The Orginal DataFrame is;
df2 = pd.read_csv('C:UsersABC.csv')
df2['Date'] = pd.to_datetime(df2['Date'])
df2['Hour'] = df2.Date.dt.hour
df2['Occupied'] = ''
Date Value Hour Occupied
2016-02-02 21:00:00 0.6 21
2016-02-02 22:00:00 0.4 22
2016-02-02 23:00:00 0.4 23
2016-02-03 00:00:00 0.3 0
2016-02-03 01:00:00 0.2 1
2016-02-03 02:00:00 0.2 2
2016-02-03 03:00:00 0.1 3
2016-02-03 04:00:00 0.2 4
2016-02-03 05:00:00 0.1 5
2016-02-03 06:00:00 0.4 6
I would like to have same values as df2.Value in the Occupied column if df2.Hour is greater than or equal to 9, otherwise the values will be zero in the Occupied column. I have tried the following code but it does not work as I would like to (it prints same values as df2.Value without considering else statement);
for i in df2['Hour']:
if i >= 9:
df2['Occupied'] = df2.Value
else:
df2['Occupied'] = 0
Any idea what is wrong with this?
use where
with your boolean condition, this will set all row values rather than iterating row-wise:
In [120]:
df2['Occupied'] = df2['Value'].where(df2['Hour'] >= 9, 0)
df2
Out[120]:
Date Value Hour Occupied
0 2016-02-02 21:00:00 0.6 21 0.6
1 2016-02-02 22:00:00 0.4 22 0.4
2 2016-02-02 23:00:00 0.4 23 0.4
3 2016-02-03 00:00:00 0.3 0 0.0
4 2016-02-03 01:00:00 0.2 1 0.0
5 2016-02-03 02:00:00 0.2 2 0.0
6 2016-02-03 03:00:00 0.1 3 0.0
7 2016-02-03 04:00:00 0.2 4 0.0
8 2016-02-03 05:00:00 0.1 5 0.0
9 2016-02-03 06:00:00 0.4 6 0.0