Set value based on previous value in previous group if it exists

Question:

Say I have this:

df = pandas.DataFrame(
  [ dict(a=75, b=numpy.nan, d='2023-01-01 00:00')
  , dict(a=82, b=numpy.nan, d='2023-01-01 10:00')
  , dict(a=39, b=numpy.nan, d='2023-01-01 20:00')
  , dict(a=10, b=82       , d='2023-01-05 00:00')
  , dict(a=90, b=82       , d='2023-01-05 20:00')
  , dict(a=61, b=numpy.nan, d='2023-02-08 00:00')
  , dict(a=35, b=numpy.nan, d='2023-02-08 10:00')
  , dict(a=95, b=numpy.nan, d='2023-02-08 20:00')
  , dict(a=21, b=35       , d='2023-04-15 00:00')
  , dict(a=60, b=35       , d='2023-04-15 10:00')
  ])                                             
df['d'] = pandas.to_datetime(df['d'])            
df = df.set_index('d')                           
print(df)                                        

which outputs:

                      a     b
d
2023-01-01 00:00:00  75   NaN
2023-01-01 10:00:00  82   NaN
2023-01-01 20:00:00  39   NaN
2023-01-05 00:00:00  10  82.0
2023-01-05 20:00:00  90  82.0
2023-02-08 00:00:00  61   NaN
2023-02-08 10:00:00  35   NaN
2023-02-08 20:00:00  95   NaN
2023-04-15 00:00:00  21  35.0
2023-04-15 10:00:00  60  35.0

In real life, I only have column a and my desired output is in column b.

Here, b equals the value in a from the previous available date at 10:00. Dates are not necessarily consecutive. Value at 10:00 may not exist for the previous available date, in which case b should be NaN.

Logically, I’d solve this by grouping by date and extracting the value from the previous group.

Without resorting to iterating each (previous group, group) tuples or something of sorts, can that be done with pandas?

More generally, are there any pandas idioms to deal with these "look up value from the previous group" situations?


I’ll be adding edits here as answers come to show additional info that doesn’t fit nicely in a comment.

For https://stackoverflow.com/a/75599866/3821009

df['c'] = df.groupby(df.index.date)['a'].shift() 
print(df)                                        

produces:

                      a     b     c
d
2023-01-01 00:00:00  75   NaN   NaN
2023-01-01 10:00:00  82   NaN  75.0
2023-01-01 20:00:00  39   NaN  82.0
2023-01-05 00:00:00  10  82.0   NaN
2023-01-05 20:00:00  90  82.0  10.0
2023-02-08 00:00:00  61   NaN   NaN
2023-02-08 10:00:00  35   NaN  61.0
2023-02-08 20:00:00  95   NaN  35.0
2023-04-15 00:00:00  21  35.0   NaN
2023-04-15 10:00:00  60  35.0  21.0

so that’s not what I’m looking for.

Asked By: levant pied

||

Answers:

Yes, I believe you can use the groupby() method along with the shift() method to accomplish this.

You could do something like,

df['b'] = df.groupby(df.index.date)['a'].shift()

This code is taking a table of data, and is breaking it down into groups based on the dates in the table. For each group, it then looks at the values in the ‘a’ column and moves them down by one row.

By doing this, the ‘b’ column now shows the value of ‘a’ from the previous group for each row within that group.

Answered By: CRM000

The general idea is:

  1. Get the value where time is 10.00
  2. Get the date group id
  3. If time is ordered, the current group id is just 1 greater than the previous
  4. Map the previous time value to the current with the group id
time = df.loc[df.index.time == pd.to_datetime('10:00:00').time(), 'a']
gid = df.groupby(df.index.date).ngroup()
df['c'] = gid.map(dict(zip(time.index.map(gid)+1, time)))
$ print(time)

d
2023-01-01 10:00:00    82
2023-02-08 10:00:00    35
2023-04-15 10:00:00    60
Name: a, dtype: int64

$ print(gid)

d
2023-01-01 00:00:00    0
2023-01-01 10:00:00    0
2023-01-01 20:00:00    0
2023-01-05 00:00:00    1
2023-01-05 20:00:00    1
2023-02-08 00:00:00    2
2023-02-08 10:00:00    2
2023-02-08 20:00:00    2
2023-04-15 00:00:00    3
2023-04-15 10:00:00    3
dtype: int64

$ print(df)

                      a     b     c
d
2023-01-01 00:00:00  75   NaN   NaN
2023-01-01 10:00:00  82   NaN   NaN
2023-01-01 20:00:00  39   NaN   NaN
2023-01-05 00:00:00  10  82.0  82.0
2023-01-05 20:00:00  90  82.0  82.0
2023-02-08 00:00:00  61   NaN   NaN
2023-02-08 10:00:00  35   NaN   NaN
2023-02-08 20:00:00  95   NaN   NaN
2023-04-15 00:00:00  21  35.0  35.0
2023-04-15 10:00:00  60  35.0  35.0
Answered By: Ynjxsjmh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.