How to randomly sample a value within a given segment?

Question

I want to create a new column "sample_group_B" which randomly samples a purchase price value from group B within the same segment of group A. How do I do this in pandas?

segment | purchase price | group
High    | 100            | A
High    | 105            | A
High    | 103            | B
High    | 104            | B
Low     | 10             | A
Low     | 9              | B
Low     | 50             | B
Low     | 55             | B

I want to create a new column that randomly samples the purchase price of group B within the respective segment such as:

segment | purchase price | group | sample_group_B
High    | 100            | A     | sample a value from (103 or 104)
High    | 105            | A     | sample a value from (103 or 104)
Low     | 10             | A     | sample a value from (9 or 50 or 55)

I tried np.random() but it returned a bunch of Nans.

Asked By: titutubs

||

Source

Answer 1

Annotated code

from random import choice

# filter the A, B groups
A = df.query("group == 'A'")
B = df.query("group == 'B'")

# Create a mapping dictionary to list 
# all purchase price for a given segment
d = B.groupby('segment')['purchase price'].agg(list)

# Map the segments in A with a choice from mapping dict
A['sample_B'] = A['segment'].map(lambda s: choice(d[s]))

Result

  segment  purchase price group  sample_B
0    High             100     A       103
1    High             105     A       104
4     Low              10     A         9

Answered By: Shubham Sharma

Answer 2

steps

split into two df
self join
sample in group

code:

# prepare sample data
d = [["High", 100, "A"]
,["High", 105, "A"]
,["High", 103, "B"]
,["High", 104, "B"]
,["Low",  10, "A"]
,["Low",  9, "B"]
,["Low",  50, "B"]
,["Low",  55, "B"]]
df = pd.DataFrame(d, columns =['segment', 'price', 'group'])

# split into two part
a=df.query("group =='A'")
b=df.query("group =='B'")

# a join b
ab=a.join(b.set_index('segment'), on = 'segment', lsuffix='_a', rsuffix='_b')

# sample in group by
ab.groupby(['segment', 'price_a']).sample(n=1)

result:

    segment price_a group_a price_b group_b
0   High    100 A   104 B
1   High    105 A   103 B
4   Low     10  A   9   B

Answered By: tianzhipeng