Pandas groupby apply a random day to each group of years

Question:

I am trying to generate a different random day within each year group of a dataframe. So I need replacement = False, otherwise it will fail.

You can’t just add a column of random numbers because I’m going to have more than 365 years in my list of years and once you hit 365 it can’t create any more random samples without replacement.

I have explored agg, aggreagte, apply and transform. The closest I have got is with this:

    years = pd.DataFrame({"year": [1,1,2,2,2,3,3,4,4,4,4]})
    years["day"] = 0
    grouped = years.groupby("year")["day"]
    grouped.transform(lambda x: np.random.choice(366, replace=False))

Which gives this:

0       8
1       8
2     319
3     319
4     319
5     149
6     149
7     130
8     130
9     130
10    130
Name: day, dtype: int64

But I want this:

0       8
1      16
2     119
3     321
4     333
5       4
6      99
7      30
8     129
9     224
10    355
Name: day, dtype: int64
Asked By: Bowser

||

Answers:

With broadcasting :

years["day"] = np.random.choice(366, years.shape[0], False) % 366
​
years["day"] = years.groupby("year").transform(lambda x: np.random.permutation(x))


Output :

print(years)

    year  day
0      1  233
1      1  147
2      2    1
3      2  340
4      2  267
5      3  204
6      3  256
7      4  354
8      4   94
9      4  196
10     4  164
Answered By: Timeless

You can use your code with a minor modification. You have to specify the number of samples.

random_days = lambda x: np.random.choice(range(1, 366), len(x), replace=False)
years['day'] = years.groupby('year').transform(random_days)

Output:

>>> years
    year  day
0      1   18
1      1  300
2      2  154
3      2  355
4      2  311
5      3   18
6      3   14
7      4  160
8      4  304
9      4   67
10     4    6
Answered By: Corralien
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.