Pandas groupby apply a random day to each group of years

Question

I am trying to generate a different random day within each year group of a dataframe. So I need replacement = False, otherwise it will fail.

You can’t just add a column of random numbers because I’m going to have more than 365 years in my list of years and once you hit 365 it can’t create any more random samples without replacement.

I have explored agg, aggreagte, apply and transform. The closest I have got is with this:

    years = pd.DataFrame({"year": [1,1,2,2,2,3,3,4,4,4,4]})
    years["day"] = 0
    grouped = years.groupby("year")["day"]
    grouped.transform(lambda x: np.random.choice(366, replace=False))

Which gives this:

0       8
1       8
2     319
3     319
4     319
5     149
6     149
7     130
8     130
9     130
10    130
Name: day, dtype: int64

But I want this:

0       8
1      16
2     119
3     321
4     333
5       4
6      99
7      30
8     129
9     224
10    355
Name: day, dtype: int64

Asked By: Bowser

||

Source

Answer 1

With numpy broadcasting :

years["day"] = np.random.choice(366, years.shape[0], False) % 366

years["day"] = years.groupby("year").transform(lambda x: np.random.permutation(x))

Output :

print(years)

    year  day
0      1  233
1      1  147
2      2    1
3      2  340
4      2  267
5      3  204
6      3  256
7      4  354
8      4   94
9      4  196
10     4  164

Answered By: Timeless

Answer 2

You can use your code with a minor modification. You have to specify the number of samples.

random_days = lambda x: np.random.choice(range(1, 366), len(x), replace=False)
years['day'] = years.groupby('year').transform(random_days)

Output:

>>> years
    year  day
0      1   18
1      1  300
2      2  154
3      2  355
4      2  311
5      3   18
6      3   14
7      4  160
8      4  304
9      4   67
10     4    6

Answered By: Corralien

Pandas groupby apply a random day to each group of years

Question:

Answers: