Initial value of multiple variables dataframe for time dilation

Question:

Dataframe:

product1 product2 product3 product4 product5
straws orange melon chair bread
melon milk book coffee cake
bread melon coffe chair book
CountProduct1 CountProduct2 CountProduct3 Countproduct4 Countproduct5
1 1 1 1 1
2 1 1 1 1
2 3 2 2 2
RatioProduct1 RatioProduct2 RatioProduct3 Ratioproduct4 Ratioproduct5
0.28 0.54 0.33 0.35 0.11
0.67 0.25 0.13 0.11 0.59
2.5 1.69 1.9 2.5 1.52

I want to create five others columns that keep my initial ratio of each item along the dataframe.

Output:

InitialRatio1 InitialRatio2 InitialRatio3 InitialRatio4 InitialRatio5
0.28 0.54 0.33 0.35 0.11
0.33 0.25 0.13 0.31 0.59
0.11 0.33 0.31 0.35 0.13
Asked By: Bry Sab

||

Answers:

If you’re after code to create the init_rateX columns then the following will work

pd.DataFrame(
    np.divide(
        df[["ratio1", "ratio2", "ratio3", "ratio4", "ratio5"]].to_numpy(),
        df[["Count1", "Count2", "Count3", "Count4", "Count5"]].to_numpy(),
    ),
    columns=["init_rate1", "init_rate2", "init_rate3", "init_rate4", "init_rate5"],
)

which gives

   init_rate1  init_rate2  init_rate3  init_rate4  init_rate5
0        0.28        0.25        0.33        0.57       0.835
1        0.33        0.13        0.97        0.65       0.760
2        0.54        0.11        0.45        0.95       1.160
3        0.35        0.59        0.34        1.25       1.650

However it does not agree with your calcs for init_rate4 or init_rate5 so some clarification might be needed.

Answered By: Riley

Check the code again. Do you have an error in product3 = coffe and product4 = coffee? Fixed coffe to coffee. As a result, 0.31 should not be.

import pandas as pd
pd.set_option('display.max_rows', None)  # print everything rows
pd.set_option('display.max_columns', None)  # print everything columns

df = pd.DataFrame(
{
    'product1':['straws', 'melon', 'bread'],
    'product2':['orange', 'milk', 'melon'],
    'product3':['melon', 'book', 'coffee'],
    'product4':['chair', 'coffee', 'chair'],
    'product5':['bread', 'cake', 'book'],
    'time':[1,2,3],
    'Count1':[1,2,2],
    'Count2':[1,1,3],
    'Count3':[1,1,2],
    'Count4':[1,1,2],
    'Count5':[1,1,2],
    'ratio1':[0.28, 0.67, 2.5],
    'ratio2':[0.54, 0.25, 1.69],
    'ratio3':[0.33, 0.13, 1.9],
    'ratio4':[0.35, 0.11, 2.5],
    'ratio5':[0.11, 0.59, 1.52],

})

print(df)

product = df[['product1', 'product2', 'product3', 'product4', 'product5']].stack().reset_index()
count = df[['Count1',  'Count2',  'Count3', 'Count4',  'Count5']].stack().reset_index()
ratio = df[['ratio1',  'ratio2',  'ratio3',  'ratio4',  'ratio5']].stack().reset_index()
print(ratio)


arr = pd.unique(product[0])
aaa = [i for i in range(len(arr)) if product[product[0] == arr[i]].count()[0] > 1]
for i in aaa:
    prod_ind = product[product[0] == arr[i]].index
    val_ratio = ratio.loc[prod_ind[0], 0]
    ratio.loc[prod_ind, 0] = val_ratio

print(ratio.pivot_table(index='level_0', columns='level_1', values=[0]))

Output:

level_1 ratio1 ratio2 ratio3 ratio4 ratio5
level_0                                   
0         0.28   0.54   0.33   0.35   0.11
1         0.33   0.25   0.13   0.11   0.59
2         0.11   0.33   0.11   0.35   0.13

To work with data, they need to be turned into one column using stack().reset_index(). Create a list of unique products arr. Further in the list aaa I get indexes of arr, which are more than one.

prod_ind = product[product[0] == arr[i]].index

In a loop, I get indexes of products that are more than one.

val_ratio = ratio.loc[prod_ind[0], 0]

Get the first value of the product.

ratio.loc[prod_ind, 0] = val_ratio

Set this value for all products.
To access the values, explicit loc indexing is used, where the row indices are in square brackets on the left, and the names of the columns on the right. Read more here.

In pivot_table I create back the table.
To insert the processed data into the original dataframe, simply use the following:

table = ratio.pivot_table(index='level_0', columns='level_1', values=[0])
df[['ratio1',  'ratio2',  'ratio3',  'ratio4',  'ratio5']] = table
print(df)
Answered By: inquirer
index 0 1 2 3 4
0 0.0625 0.034482758620689655 0.03125 0.027777777777777776 0.024390243902439025
1 0.2857142857142857 0.15384615384615385 0.05128205128205128 0.0425531914893617 0.04
2 0.21428571428571427 0.16666666666666666 0.15789473684210525 0.0967741935483871 0.08108108108108109

This is a sample of columns of ratio from my original df, where columns count == 1, which means the initial ratio.

And that’s what happened when I used your code.

variable k1 k2 k3 k4 k5
1 0.062500 8.827586 8.093750 0.166667 7.439024
2 0.285714 16.461538 8.615385 1.829787 0.040000
3 0.214286 16.888889 12.631579 16.129032 3.567568

It completely changes variables of columns except the first one.
Well thanks for your enormous help as well as @Riley. I’ll try to find another way, maybe pandas is just not good enough for such tasks.
Thanks a lot for your help and time you put on work.

Answered By: Bry Sab
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.