Add missing rows per group to fill a series
Question:
I have the following df:
Name
Time Elapsed
Amount
A
2
$2
A
5
$1
A
7
$6
B
3
$3
B
5
$5
I would like to expand this per group so that I get all the times in between the min and max, and then fill downwards (assuming it’s sorted by name then by time elapsed) to produce the following output
Name
Time Elapsed
Amount
A
2
$2
A
3
$2
A
4
$2
A
5
$1
A
6
$1
A
7
$6
B
3
$3
B
4
$3
B
5
$5
I tried the following:
df = df(df.set_index('Time Elapsed')
.groupby('Name')['Amount']
.apply(lambda x: x.reindex(range(x.index.min(), x.index.max()+1))
.ffill().fillna(0)).reset_index)
Feel free to offer a solution that does not use any of my code.
Answers:
try this:
def func(g: pd.DataFrame):
tmp = g.set_index('Time Elapsed')
res = tmp.reindex(
np.arange(tmp.index.min(), tmp.index.max()+1),
method='ffill')
return res
grouped = df.groupby('Name', as_index=False)
result = grouped.apply(func).reset_index().reindex(columns=df.columns)
print(result)
>>>
Name Time Elapsed Amount
0 A 2 $2
1 A 3 $2
2 A 4 $2
3 A 5 $1
4 A 6 $1
5 A 7 $6
6 B 3 $3
7 B 4 $3
8 B 5 $5
One option is with complete from pyjanitor, to expose the missing rows:
# pip install pyjanitor
import pandas as pd
import janitor
# build a dictionary of all the possible times
# the key of the dictionary should be the column to be expanded
times = {"Time Elapsed" : lambda df: range(df.min(), df.max() + 1)}
df.complete(times, by = 'Name').ffill()
Name Time Elapsed Amount
0 A 2 $2
1 A 3 $2
2 A 4 $2
3 A 5 $1
4 A 6 $1
5 A 7 $6
6 B 3 $3
7 B 4 $3
8 B 5 $5
I have the following df:
Name | Time Elapsed | Amount |
---|---|---|
A | 2 | $2 |
A | 5 | $1 |
A | 7 | $6 |
B | 3 | $3 |
B | 5 | $5 |
I would like to expand this per group so that I get all the times in between the min and max, and then fill downwards (assuming it’s sorted by name then by time elapsed) to produce the following output
Name | Time Elapsed | Amount |
---|---|---|
A | 2 | $2 |
A | 3 | $2 |
A | 4 | $2 |
A | 5 | $1 |
A | 6 | $1 |
A | 7 | $6 |
B | 3 | $3 |
B | 4 | $3 |
B | 5 | $5 |
I tried the following:
df = df(df.set_index('Time Elapsed')
.groupby('Name')['Amount']
.apply(lambda x: x.reindex(range(x.index.min(), x.index.max()+1))
.ffill().fillna(0)).reset_index)
Feel free to offer a solution that does not use any of my code.
try this:
def func(g: pd.DataFrame):
tmp = g.set_index('Time Elapsed')
res = tmp.reindex(
np.arange(tmp.index.min(), tmp.index.max()+1),
method='ffill')
return res
grouped = df.groupby('Name', as_index=False)
result = grouped.apply(func).reset_index().reindex(columns=df.columns)
print(result)
>>>
Name Time Elapsed Amount
0 A 2 $2
1 A 3 $2
2 A 4 $2
3 A 5 $1
4 A 6 $1
5 A 7 $6
6 B 3 $3
7 B 4 $3
8 B 5 $5
One option is with complete from pyjanitor, to expose the missing rows:
# pip install pyjanitor
import pandas as pd
import janitor
# build a dictionary of all the possible times
# the key of the dictionary should be the column to be expanded
times = {"Time Elapsed" : lambda df: range(df.min(), df.max() + 1)}
df.complete(times, by = 'Name').ffill()
Name Time Elapsed Amount
0 A 2 $2
1 A 3 $2
2 A 4 $2
3 A 5 $1
4 A 6 $1
5 A 7 $6
6 B 3 $3
7 B 4 $3
8 B 5 $5