How can I reference the line above a particular line in a pandas dataframe?

Question:

I have the following pandas dataframe:

issue_stat     timestamp   state
    0          11:00:00     hi
    1          12:40:00     lo
    9          13:00:00     av
    3          15:00:00     hi 
    8          18:00:00     hi
    4          20:00:00     lo

I want to map the state of the line above timestamp=18:00:00 to jazz. I MUST use the timestamp=18:00:00 in my code. How would I do this?

I know how to map the state of timestamp=18:00:00:

dataframe.loc[dataframe['timestamp'] == '18:00:00', 'state'] = whatever

But I am having difficult pointing to the line above it. Again I emphasise, I MUST reference the timestamp = 18:00:00 in my code.

So the output looks like this:

issue_stat     timestamp   state
     0          11:00:00     hi
     1          12:40:00     lo
     9          13:00:00     av
     3          15:00:00    jazz 
     8          18:00:00     hi
     4          20:00:00     lo
Asked By: Patrick Chong

||

Answers:

You can use the shift function to get the previous row of the dataframe, then you can use boolean indexing to select the rows where timestamp equals 18:00:00 and update the state column for those rows.

Here is an example of how you can do this:

df['prev_state'] = df['state'].shift(1)
df.loc[df['timestamp'] == '18:00:00', 'prev_state'] = 'jazz'

This will add a new column called prev_state to the dataframe, which contains the value of the state column for the previous row. Then, it will update the prev_state column for the rows where timestamp equals 18:00:00 to jazz.

If you want to update the state column instead of adding a new column, you can simply use the following code:

df.loc[df['timestamp'] == '18:00:00', 'state'] = 'jazz'

This will update the state column for the rows where timestamp equals 18:00:00 to jazz.

Answered By: Uncoke

The shift() method moves the series in either direction. So to set the state of the cell where the following timestamp is 18:00:

df.loc[df["timestamp"].shift(-1) == '18:00:00', 'state'] = 'jazz'

Produces:

   issue_stat           timestamp    state
0           0            11:00:00       hi
1           1            12:40:00       lo
2           9            13:00:00       av
3           3            15:00:00     jazz
4           8            18:00:00       hi
5           4            20:00:00       lo
Answered By: DobbyTheElf

You can use shift():

test_df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
    'B' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], 
    'state' : ['NY', 'NY', 'NY', 'NY', 'NY', 'FL', 'FL', 'FL', 'FL', 'FL']})
    A   B   state
0   1   A   NY
1   2   B   NY
2   3   C   NY
3   4   D   NY
4   5   E   NY
5   6   F   FL
6   7   G   FL
7   8   H   FL
8   9   I   FL
9   10  J   FL

I reference value C but value gets changed for value B – 1 line above that.

test_df.loc[test_df.index[test_df['B'] == 'C'] - 1, 'state'] = 'new'
test_df
    A   B   state
0   1   A   NY
1   2   B   new
2   3   C   NY
3   4   D   NY
4   5   E   NY
5   6   F   FL
6   7   G   FL
7   8   H   FL
8   9   I   FL
9   10  J   FL

But in your case you would chose the column timestamp instead of B and a value of your choosing.

Answered By: Jonas Palačionis
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.