How to add a row in a special form
Question:
I have a pandas.DataFrame
of the form
index df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 2 111
6 3 111
7 0 111
8 2 111
9 3 111
10 0 111
11 1 111
12 2 111
13 3 111
14 0 111
15 1 111
16 2 111
17 3 111
18 1 111
19 2 111
20 3 111
I want to create a dataframe in which column df repeats 0,1,2,3. But there is something missing in the data. I’m trying to fill in the blanks with 0 by appending row values.
Here is my expected result:
index df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
How can I achieve this?
edit:
What should I do if my input is as below?
index df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 3 111
6 1 111
7 2 111
Here is my expected result:
index df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
Answers:
You can set a custom grouping to detect when the increasing numbers in "df" reset to a lower (or equal) value.
Then reindex using the product of the unique values in "df" and the unique groups.
Finally, rework the output with a combination of fillna
/reset_index
/rename_axis
:
# uncomment below if "index" is not the index
# df = df.set_index('index')
# find positions where "df" resets and make groups
groups = df['df'].diff().le(0).cumsum()
(df.set_index([groups, 'df'], drop=True) # set custom groups and "df" as index
.reindex(pd.MultiIndex.from_product([groups.unique(), # reindex with all
range(4), # combinations
], names=['group', 'df']))
.fillna(0, downcast='infer') # set missing values as zero
.reset_index('df') # all below to restore a range index
.reset_index(drop=True)
.rename_axis('index')
)
output:
df df1
index
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
output on second example:
df1 df2
index
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
Using @Mozway’s idea, and combining with some helper functions from pyjanitor
, the missing values can be made explicit, and later filled. Again, this is just another option :
# pip install pyjanitor
import pandas as pd
import janitor as jn
(df.assign(temp = df.df.diff().le(0).cumsum())
.complete('df', 'temp') # helper function
.fillna(0)
# relevant if you care about the order
.sort_values('temp', kind='mergesort')
# helper function
.select_columns('df*') # or .drop(columns='temp')
)
df df1
0 0 111.0
6 1 111.0
12 2 111.0
18 3 111.0
1 0 111.0
7 1 0.0
13 2 111.0
19 3 111.0
2 0 111.0
8 1 0.0
14 2 111.0
20 3 111.0
3 0 111.0
9 1 111.0
15 2 111.0
21 3 111.0
4 0 111.0
10 1 111.0
16 2 111.0
22 3 111.0
5 0 0.0
11 1 111.0
17 2 111.0
23 3 111.0
You can set group on increasing sequence of column df
. Then use .unstack()
and .stack()
, as follows:
group = df['df'].le(df['df'].shift()).cumsum() # new group if column `df` <= `df` last entry
df_out = (df.set_index([group, 'df']) # set `group` and column `df` as index
.unstack(fill_value=0) # unstack `df` and fill missing entry of `df` in [0,1,2,3] as 0 for `df1`
.stack() # stack back to original shape
.droplevel(0) # drop `group` from index
.reset_index() # restore `df` from index back to data column
)
Result:
print(df_out)
df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
For the edited input, use similar codes:
group = df['df1'].le(df['df1'].shift()).cumsum()
df_out2 = (df.set_index([group, 'df1'])
.unstack(fill_value=0)
.stack()
.droplevel(0)
.reset_index()
)
Result:
print(df_out2)
df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
first, group the df
df11=df1.assign(group=(df1.df.diff()<1).cumsum())
second ,build a new df
df2=pd.DataFrame({'df':[0,1,2,3]*df11.group.max()}).assign(col2=lambda dd:(dd.df==0).cumsum()-1)
third,merge and fill
df2.merge(df11, how='left', left_on=['df','col2'], right_on=['df','group'])
.loc[:,['df','df1']].fillna(0).astype(int)
df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
I have a pandas.DataFrame
of the form
index df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 2 111
6 3 111
7 0 111
8 2 111
9 3 111
10 0 111
11 1 111
12 2 111
13 3 111
14 0 111
15 1 111
16 2 111
17 3 111
18 1 111
19 2 111
20 3 111
I want to create a dataframe in which column df repeats 0,1,2,3. But there is something missing in the data. I’m trying to fill in the blanks with 0 by appending row values.
Here is my expected result:
index df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
How can I achieve this?
edit:
What should I do if my input is as below?
index df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 3 111
6 1 111
7 2 111
Here is my expected result:
index df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
You can set a custom grouping to detect when the increasing numbers in "df" reset to a lower (or equal) value.
Then reindex using the product of the unique values in "df" and the unique groups.
Finally, rework the output with a combination of fillna
/reset_index
/rename_axis
:
# uncomment below if "index" is not the index
# df = df.set_index('index')
# find positions where "df" resets and make groups
groups = df['df'].diff().le(0).cumsum()
(df.set_index([groups, 'df'], drop=True) # set custom groups and "df" as index
.reindex(pd.MultiIndex.from_product([groups.unique(), # reindex with all
range(4), # combinations
], names=['group', 'df']))
.fillna(0, downcast='infer') # set missing values as zero
.reset_index('df') # all below to restore a range index
.reset_index(drop=True)
.rename_axis('index')
)
output:
df df1
index
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
output on second example:
df1 df2
index
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
Using @Mozway’s idea, and combining with some helper functions from pyjanitor
, the missing values can be made explicit, and later filled. Again, this is just another option :
# pip install pyjanitor
import pandas as pd
import janitor as jn
(df.assign(temp = df.df.diff().le(0).cumsum())
.complete('df', 'temp') # helper function
.fillna(0)
# relevant if you care about the order
.sort_values('temp', kind='mergesort')
# helper function
.select_columns('df*') # or .drop(columns='temp')
)
df df1
0 0 111.0
6 1 111.0
12 2 111.0
18 3 111.0
1 0 111.0
7 1 0.0
13 2 111.0
19 3 111.0
2 0 111.0
8 1 0.0
14 2 111.0
20 3 111.0
3 0 111.0
9 1 111.0
15 2 111.0
21 3 111.0
4 0 111.0
10 1 111.0
16 2 111.0
22 3 111.0
5 0 0.0
11 1 111.0
17 2 111.0
23 3 111.0
You can set group on increasing sequence of column df
. Then use .unstack()
and .stack()
, as follows:
group = df['df'].le(df['df'].shift()).cumsum() # new group if column `df` <= `df` last entry
df_out = (df.set_index([group, 'df']) # set `group` and column `df` as index
.unstack(fill_value=0) # unstack `df` and fill missing entry of `df` in [0,1,2,3] as 0 for `df1`
.stack() # stack back to original shape
.droplevel(0) # drop `group` from index
.reset_index() # restore `df` from index back to data column
)
Result:
print(df_out)
df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
For the edited input, use similar codes:
group = df['df1'].le(df['df1'].shift()).cumsum()
df_out2 = (df.set_index([group, 'df1'])
.unstack(fill_value=0)
.stack()
.droplevel(0)
.reset_index()
)
Result:
print(df_out2)
df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
first, group the df
df11=df1.assign(group=(df1.df.diff()<1).cumsum())
second ,build a new df
df2=pd.DataFrame({'df':[0,1,2,3]*df11.group.max()}).assign(col2=lambda dd:(dd.df==0).cumsum()-1)
third,merge and fill
df2.merge(df11, how='left', left_on=['df','col2'], right_on=['df','group'])
.loc[:,['df','df1']].fillna(0).astype(int)
df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111