Pandas group by column and add numbered rows to each group

Question

I have the following data stored in a pandas.DataFrame object named df. The column id is a unique identifier and the remaining columns are irrelevant for the purpose of this question.

	id	x1	x2	x3
0	01001	523.41	639673	1222.13
1	01002	54.832	33746	615.443
2	01003	48.3824	45196	934.142

I want to know if there’s a way to group by id and use assign to add multiple numbered rows to each group.

In other words, I want to use range to add an arbitrary number of rows to each id. The desired result looks as follows:

	id	x1	x2	x3	new_col
0	01001	523.41	639673	1222.13	2020
0	01001	523.41	639673	1222.13	2021
1	01002	54.832	33746	615.443	2020
1	01002	54.832	33746	615.443	2021
2	01003	48.3824	45196	934.142	2020
2	01003	48.3824	45196	934.142	2021

I was hoping something along these lines would work.

df = df.groupby('id').assign(new_col=range(2020, 2022))

Asked By: Arturo Sbr

||

Source

Answer 1

You can repeat the index and use numpy.tile to tile the new data:

import numpy as np

new = ['2020', '2021']

df2 = df.loc[df.index.repeat(len(new))]
df2['new_col'] = np.tile(new, len(df))

Output:

     id        x1      x2        x3 new_col
0  1001  523.4100  639673  1222.130    2020
0  1001  523.4100  639673  1222.130    2021
1  1002   54.8320   33746   615.443    2020
1  1002   54.8320   33746   615.443    2021
2  1003   48.3824   45196   934.142    2020
2  1003   48.3824   45196   934.142    2021

Alternatively:

new = range(2020, 2022)
df2 = df.loc[df.index.repeat(len(new))]
df2['new_col'] = new.start + df2.groupby('id').cumcount()

Answered By: mozway

Answer 2

We can also create a MultiIndex.from_product and reindex relative to level=0, to scale up the DataFrame, and reset_index to convert the inner level of the MultiIndex into a column:

df = df.reindex(
    index=pd.MultiIndex.from_product(
        [df.index, range(2020, 2022)],
        names=[None, 'new_col']
    ),
    level=0
).reset_index(level='new_col')

df:

   new_col    id        x1      x2        x3
0     2020  1001  523.4100  639673  1222.130
0     2021  1001  523.4100  639673  1222.130
1     2020  1002   54.8320   33746   615.443
1     2021  1002   54.8320   33746   615.443
2     2020  1003   48.3824   45196   934.142
2     2021  1003   48.3824   45196   934.142

We can reorder new_col to the end if needed:

df = df.reindex(
    index=pd.MultiIndex.from_product(
        [df.index, range(2020, 2022)],
        names=[None, 'new_col']
    ),
    level=0
).reset_index(level='new_col')

# Re-order columns
df = df.reindex(
    columns=df.columns[df.columns != 'new_col'].union(['new_col'], sort=False)
)

df:

     id        x1      x2        x3  new_col
0  1001  523.4100  639673  1222.130     2020
0  1001  523.4100  639673  1222.130     2021
1  1002   54.8320   33746   615.443     2020
1  1002   54.8320   33746   615.443     2021
2  1003   48.3824   45196   934.142     2020
2  1003   48.3824   45196   934.142     2021

Setup:

import pandas as pd

df = pd.DataFrame({
    'id': [1001, 1002, 1003],
    'x1': [523.41, 54.832, 48.3824],
    'x2': [639673, 33746, 45196],
    'x3': [1222.13, 615.443, 934.142]
})

Answered By: Henry Ecker

Answer 3

df1.merge(pd.Series(range(2020, 2022),name='col1'),how='cross')

    id        x1      x2        x3  col1
0  1001  523.4100  639673  1222.130  2020
1  1001  523.4100  639673  1222.130  2021
2  1002   54.8320   33746   615.443  2020
3  1002   54.8320   33746   615.443  2021
4  1003   48.3824   45196   934.142  2020
5  1003   48.3824   45196   934.142  2021

Answered By: G.G

Pandas group by column and add numbered rows to each group

Question:

Answers: