Pandas: Create new column with repeating values based on non-repeating values in another column

Question:

I have a dataframe with the following column the follows this format:

df = pd.DataFrame(data={
  'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623] 
  'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
})

I want to create a new column that takes the values from ‘value’ and repeats it the number of times downward from ‘repeatVal’ so the output looks like ‘result’:

df = pd.DataFrame(data={
  'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623] 
  'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
  'result': ['NaN', 456, 456, 'NaN', 121, 121, 121, 'NaN', 'NaN']
})

To be clear, I do not want to duplicate the rows, I only want to create a new col where values are repeated n times, where n is specified in a different column. The format of the column ‘repeatVals’ is such that there will never be overlap–that there will always be sufficient NaN values between the repeat indicators in ‘repeatVals’

I have read the docs on np.repeat and np.tile but those don’t appear to solve this issue.

Asked By: Whitewater

||

Answers:

One option using groupby.cumcount as masks:

df = df.replace('NaN', float('nan'))

m1 = df['repeatVal'].notna()
m2 = df.groupby(m1.cumsum()).cumcount().lt(df['repeatVal'].ffill())
df['result'] = df['value'].where(m1).ffill().where(m2)

Output:

   value  repeatVal  result
0    123        NaN     NaN
1    456        2.0   456.0
2    789        NaN   456.0
3    111        NaN     NaN
4    121        3.0   121.0
5  34523        NaN   121.0
6   4352        NaN   121.0
7  45343        NaN     NaN
8    623        NaN     NaN

Intermediates:

   value  repeatVal  result     m1  m1.cumsum()  cumcount  cumcount < repeatVal.ffill()  value/masked/ffill
0    123        NaN     NaN  False            0         0                         False                 NaN
1    456        2.0   456.0   True            1         0                          True               456.0
2    789        NaN   456.0  False            1         1                          True               456.0
3    111        NaN     NaN  False            1         2                         False               456.0
4    121        3.0   121.0   True            2         0                          True               121.0
5  34523        NaN   121.0  False            2         1                          True               121.0
6   4352        NaN   121.0  False            2         2                          True               121.0
7  45343        NaN     NaN  False            2         3                         False               121.0
8    623        NaN     NaN  False            2         4                         False               121.0
Answered By: mozway

Here is a way using index.repeat

((v := df.loc[df.index.repeat(df['repeatVal'].fillna(0)),'value'])
.set_axis(v.groupby(v).cumcount() + v.index))

Output:

1    456
2    456
4    121
5    121
6    121
Answered By: rhug123
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.