“Is there an pandas function for adding a new column based on certain values of another column of the data frame?”

Question

I am trying to create a new column in data frame based on time value in another column i.e if time is between 06:00:00 and 12:00:00 then Morning, if time is between 12:0:00 and 15:00:00 the afternoon and so on

I have tried using a for loop and if else statements but my dataframe has 1549293 rows so the loop is not fiesable

import datetime
import time
times= [datetime.time(6,0,0),datetime.time(12,0,0),datetime.time(15,0,0),datetime.time(20,0,0),datetime.time(23,0,0)]
times

df['time']=df['start_time'].dt.time
df['day_interval']=df['time']

for i in range(0,df.shape[0]):

    if df['time'][i] >= times[0] and df['time'][i] < times[1]:
        df['day_interval'][i]= "Morning"
    elif df['time'][i] >= times[1] and df['time'][i] < times[2]:
        df['day_interval'][i]= "Afternoon"
    elif df['time'][i] >= times[2] and df['time'][i] < times[3]:
        df['day_interval'][i]= "Evening"
    elif df['time'][i] >= times[3] and df['time'][i] < times[4]:
        df['day_interval'][i]= "Night"
    elif df['time'][i] >= times[4]:
        df['day_interval'][i]= "Late Night"
    if df['time'][i] < times[0]:
        df['day_interval'][i]= "Early Hours"

Is there some way to reduce the time taken for processing

Asked By: Aakash Patel

||

Source

Answer 1

Row-wise loops should almost never be used in pandas. Pandas supports vectorized operations:

df.loc[(df['time'] >= times[0]) & (df['time'] < times[1]),
       'day_interval'] = "Morning"
df.loc[(df['time'] >= times[1]) & (df['time'] < times[2]),
       'day_interval'] = "Afternoon"

Etc. But using pd.cut is even more elegant – see W-B’s solution.

Answered By: DYZ

Answer 2

Using pd.cut Notice I adding two time in your times 00:00:00 and 23:59:59

pd.cut(s1,bins=pd.to_datetime(pd.Series(times),format='%H:%M:%S').tolist(),labels=['Early','M','A','E','N','L'])
0    Early
1        M
Name: time, dtype: category
Categories (6, object): [Early < M < A < E < N < L]

Data setup

times= [datetime.time(0,0,0),datetime.time(6,0,0),datetime.time(12,0,0),datetime.time(15,0,0),datetime.time(20,0,0),datetime.time(23,0,0),datetime.time(23,59,59)]
s1=pd.to_datetime(df.time,format='%H:%M:%S')

Answered By: BENY

Answer 3

In pandas/numpy land, most of the time if you are reaching for a foorloop, there is probably a better way.

Not sure if faster, but this I think is at least a little cleaner [hopefully correct also?]

def time_of_day(hour):
    if hour < 6:
        return 'Early Hours'
    elif 6 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 15:
        return 'Afternoon'
    elif 15 <= hour < 20:
        return 'Evening'
    elif 20 <= hour < 23:
        return 'Night'
    else:
        return 'Late Night'


def main():
    # ... code that generates df ...
    df['day_interval'] = df['start_time'].dt.hour.map(time_of_day)


if __name__ == '__main__':
    main()

Answered By: lexual

Answer 4

I will throw it out there as an option df.between_time with loc

df = pd.DataFrame(np.random.randn(25), index=pd.date_range('2017-08-20', '2017-08-21', freq='H'))

df.loc[df.between_time('06:00:00', '12:00:00').index, 'newCol'] = 'morning'
df.loc[df.between_time('12:00:00', '15:00:00').index, 'newCol'] = 'afternoon'

update per comment

If you want to use between_time on a column and not an index then try:

# sample data
df = pd.DataFrame(np.random.randn(25),
                  index=pd.date_range('2017-08-20', '2017-08-21', freq='H'))
df = df.reset_index().rename(columns={'index': 'date'})

# create a datetime index from the date column
idx = pd.DatetimeIndex(df['date'])

# create a mask using between_time
morning_mask = idx.indexer_between_time('06:00:00', '12:00:00')
afternoon_mask = idx.indexer_between_time('12:00:00', '15:00:00')

# use loc to assign value to a new column
df.loc[morning_mask, 'newCol'] = 'morning'
df.loc[afternoon_mask, 'newCol'] = 'afternoon'

Answered By: It_is_Chris

“Is there an pandas function for adding a new column based on certain values of another column of the data frame?”

Question:

Answers:

update per comment