How to seperate dataframe in group with a five minutes interval?
Question:
I have a dataframe like this:
timestamp
id
data1
2022-12-12 10:03:02
a1
x1
2022-12-12 10:03:02
a2
c1
2022-12-12 10:04:12
a1
x2
2022-12-12 10:04:12
a2
c2
2022-12-12 10:05:02
a1
x3
2022-12-12 10:05:02
a2
c3
2022-12-12 10:09:15
a1
x4
2022-12-12 10:09:15
a2
c4
2022-12-12 10:12:15
a1
x5
2022-12-12 10:12:15
a2
c5
I only need rows that have newest data within the five minutes interval, where the start time is set to be some time by me, for example in the above table is 10:00:00. So, with the above table, I would like to have a final table like this:
timestamp
id
data1
2022-12-12 10:04:12
a1
x2
2022-12-12 10:04:12
a2
c2
2022-12-12 10:09:15
a1
x4
2022-12-12 10:09:15
a2
c4
2022-12-12 10:12:15
a1
x5
2022-12-12 10:12:15
a2
c5
I tried:
df.groupby([‘id’, ‘timestamp’]).resample("5min").last()
But this is not what I want.
Answers:
Create a new column (i.e., interval
) that represents the five-minute interval for each and then use .groupby()
with .last()
.
start_time = pd.to_datetime("10:00:00")
df['interval'] = (df['timestamp'] - start_time).dt.total_seconds() // 300
grouped_df = df.groupby(['interval', 'id']).last()
final_df = grouped_df.reset_index().drop(columns=['interval'])
output:
id timestamp data1
0 a1 2022-12-12 10:04:12 x2
1 a2 2022-12-12 10:04:12 c2
2 a1 2022-12-12 10:09:15 x4
3 a2 2022-12-12 10:09:15 c4
4 a1 2022-12-12 10:12:15 x5
5 a2 2022-12-12 10:12:15 c5
I have a dataframe like this:
timestamp | id | data1 |
---|---|---|
2022-12-12 10:03:02 | a1 | x1 |
2022-12-12 10:03:02 | a2 | c1 |
2022-12-12 10:04:12 | a1 | x2 |
2022-12-12 10:04:12 | a2 | c2 |
2022-12-12 10:05:02 | a1 | x3 |
2022-12-12 10:05:02 | a2 | c3 |
2022-12-12 10:09:15 | a1 | x4 |
2022-12-12 10:09:15 | a2 | c4 |
2022-12-12 10:12:15 | a1 | x5 |
2022-12-12 10:12:15 | a2 | c5 |
I only need rows that have newest data within the five minutes interval, where the start time is set to be some time by me, for example in the above table is 10:00:00. So, with the above table, I would like to have a final table like this:
timestamp | id | data1 |
---|---|---|
2022-12-12 10:04:12 | a1 | x2 |
2022-12-12 10:04:12 | a2 | c2 |
2022-12-12 10:09:15 | a1 | x4 |
2022-12-12 10:09:15 | a2 | c4 |
2022-12-12 10:12:15 | a1 | x5 |
2022-12-12 10:12:15 | a2 | c5 |
I tried:
df.groupby([‘id’, ‘timestamp’]).resample("5min").last()
But this is not what I want.
Create a new column (i.e., interval
) that represents the five-minute interval for each and then use .groupby()
with .last()
.
start_time = pd.to_datetime("10:00:00")
df['interval'] = (df['timestamp'] - start_time).dt.total_seconds() // 300
grouped_df = df.groupby(['interval', 'id']).last()
final_df = grouped_df.reset_index().drop(columns=['interval'])
output:
id timestamp data1 0 a1 2022-12-12 10:04:12 x2 1 a2 2022-12-12 10:04:12 c2 2 a1 2022-12-12 10:09:15 x4 3 a2 2022-12-12 10:09:15 c4 4 a1 2022-12-12 10:12:15 x5 5 a2 2022-12-12 10:12:15 c5