Randomly select 30% of the sum value in Python Pandas

Question:

I’m using Python pandas and have a data frame that is pulled from my CSV file:

ID          Value
123         10
432         14
213         12

'''

214         2
999         43

I want to randomly select some rows with the condition that the sum of the selected values = 30% of the total value.

Please advise how should I write this condition.

Asked By: Mary

||

Answers:

You can first shuffle the rows with sample, then filter using loc, cumsum and comparison to be ≤ to 30% of the total:

out = df.sample(frac=1).loc[lambda d: d['Value'].cumsum().le(d['Value'].sum()*0.3)]

Example output:

   ID  Value
0  123     10
3  214      2
2  213     12

Intermediates:

    ID  Value  cumsum   ≤30%
0  123     10      10   True
3  214      2      12   True
2  213     12      24   True
1  432     14      38  False
4  999     43      81  False
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.