How to add a row in a special form

Question:

I have a pandas.DataFrame of the form

index     df      df1

0         0       111
1         1       111
2         2       111
3         3       111
4         0       111
5         2       111
6         3       111
7         0       111
8         2       111
9         3       111
10        0       111
11        1       111
12        2       111
13        3       111
14        0       111
15        1       111
16        2       111
17        3       111
18        1       111
19        2       111
20        3       111

I want to create a dataframe in which column df repeats 0,1,2,3. But there is something missing in the data. I’m trying to fill in the blanks with 0 by appending row values.
Here is my expected result:

index     df      df1

0         0       111
1         1       111
2         2       111
3         3       111
4         0       111
5         1       0
6         2       111
7         3       111
8         0       111
9         1       0
10        2       111
11        3       111
12        0       111
13        1       111
14        2       111
15        3       111
16        0       111
17        1       111
18        2       111
19        3       111
20        0       0
21        1       111
22        2       111
23        3       111

How can I achieve this?

edit:

What should I do if my input is as below?

index     df1      df2

0          0       111
1          1       111
2          2       111
3          3       111
4          0       111
5          3       111
6          1       111
7          2       111

Here is my expected result:

index  df1   df2

0         0       111
1         1       111
2         2       111
3         3       111
4         0       111
5         1       0
6         2       0
7         3       111
8         0       0       
9         1       111
10        2       111 
11        3       0 
Asked By: κΉ€μˆ˜ν™˜

||

Answers:

You can set a custom grouping to detect when the increasing numbers in "df" reset to a lower (or equal) value.

Then reindex using the product of the unique values in "df" and the unique groups.

Finally, rework the output with a combination of fillna/reset_index/rename_axis:

# uncomment below if "index" is not the index
# df = df.set_index('index')

# find positions where "df" resets and make groups
groups = df['df'].diff().le(0).cumsum()

(df.set_index([groups, 'df'], drop=True) # set custom groups and "df" as index
   .reindex(pd.MultiIndex.from_product([groups.unique(),   # reindex with all
                                        range(4),          # combinations
                                       ], names=['group', 'df']))
   .fillna(0, downcast='infer') # set missing values as zero
   .reset_index('df')           # all below to restore a range index
   .reset_index(drop=True)
   .rename_axis('index')
)

output:

       df  df1
index         
0       0  111
1       1  111
2       2  111
3       3  111
4       0  111
5       1    0
6       2  111
7       3  111
8       0  111
9       1    0
10      2  111
11      3  111
12      0  111
13      1  111
14      2  111
15      3  111
16      0  111
17      1  111
18      2  111
19      3  111
20      0    0
21      1  111
22      2  111
23      3  111

output on second example:

       df1  df2
index          
0        0  111
1        1  111
2        2  111
3        3  111
4        0  111
5        1    0
6        2    0
7        3  111
8        0    0
9        1  111
10       2  111
11       3    0
Answered By: mozway

Using @Mozway’s idea, and combining with some helper functions from pyjanitor, the missing values can be made explicit, and later filled. Again, this is just another option :

# pip install pyjanitor
import pandas as pd
import janitor as jn
(df.assign(temp = df.df.diff().le(0).cumsum())
   .complete('df', 'temp') # helper function
   .fillna(0)
    # relevant if you care about the order
   .sort_values('temp', kind='mergesort')
    # helper function
   .select_columns('df*') # or .drop(columns='temp')
)
 
    df    df1
0    0  111.0
6    1  111.0
12   2  111.0
18   3  111.0
1    0  111.0
7    1    0.0
13   2  111.0
19   3  111.0
2    0  111.0
8    1    0.0
14   2  111.0
20   3  111.0
3    0  111.0
9    1  111.0
15   2  111.0
21   3  111.0
4    0  111.0
10   1  111.0
16   2  111.0
22   3  111.0
5    0    0.0
11   1  111.0
17   2  111.0
23   3  111.0
Answered By: sammywemmy

You can set group on increasing sequence of column df. Then use .unstack() and .stack(), as follows:

group = df['df'].le(df['df'].shift()).cumsum()   # new group if column `df` <= `df` last entry

df_out = (df.set_index([group, 'df'])    # set `group` and column `df` as index
            .unstack(fill_value=0)       # unstack `df` and fill missing entry of `df` in [0,1,2,3] as 0 for `df1`
            .stack()                     # stack back to original shape
            .droplevel(0)                # drop `group` from index
            .reset_index()               # restore `df` from index back to data column
         )

Result:

print(df_out)


    df  df1
0    0  111
1    1  111
2    2  111
3    3  111
4    0  111
5    1    0
6    2  111
7    3  111
8    0  111
9    1    0
10   2  111
11   3  111
12   0  111
13   1  111
14   2  111
15   3  111
16   0  111
17   1  111
18   2  111
19   3  111
20   0    0
21   1  111
22   2  111
23   3  111

For the edited input, use similar codes:

group = df['df1'].le(df['df1'].shift()).cumsum()

df_out2 = (df.set_index([group, 'df1'])
             .unstack(fill_value=0)
             .stack()
             .droplevel(0)
             .reset_index()
         )

Result:

print(df_out2)


    df1  df2
0     0  111
1     1  111
2     2  111
3     3  111
4     0  111
5     1    0
6     2    0
7     3  111
8     0    0
9     1  111
10    2  111
11    3    0
Answered By: SeaBean

first, group the df

df11=df1.assign(group=(df1.df.diff()<1).cumsum())

second ,build a new df

df2=pd.DataFrame({'df':[0,1,2,3]*df11.group.max()}).assign(col2=lambda dd:(dd.df==0).cumsum()-1)

third,merge and fill

df2.merge(df11, how='left', left_on=['df','col2'], right_on=['df','group'])
    .loc[:,['df','df1']].fillna(0).astype(int)

    df  df1
0    0  111
1    1  111
2    2  111
3    3  111
4    0  111
5    1    0
6    2  111
7    3  111
8    0  111
9    1    0
10   2  111
11   3  111
12   0  111
13   1  111
14   2  111
15   3  111
16   0  111
17   1  111
18   2  111
19   3  111
20   0    0
21   1  111
22   2  111
23   3  111
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.