Initial value of multiple variables dataframe for time dilation
Question:
Dataframe:
product1
product2
product3
product4
product5
straws
orange
melon
chair
bread
melon
milk
book
coffee
cake
bread
melon
coffe
chair
book
CountProduct1
CountProduct2
CountProduct3
Countproduct4
Countproduct5
1
1
1
1
1
2
1
1
1
1
2
3
2
2
2
RatioProduct1
RatioProduct2
RatioProduct3
Ratioproduct4
Ratioproduct5
0.28
0.54
0.33
0.35
0.11
0.67
0.25
0.13
0.11
0.59
2.5
1.69
1.9
2.5
1.52
I want to create five others columns that keep my initial ratio of each item along the dataframe.
Output:
InitialRatio1
InitialRatio2
InitialRatio3
InitialRatio4
InitialRatio5
0.28
0.54
0.33
0.35
0.11
0.33
0.25
0.13
0.31
0.59
0.11
0.33
0.31
0.35
0.13
Answers:
If you’re after code to create the init_rateX
columns then the following will work
pd.DataFrame(
np.divide(
df[["ratio1", "ratio2", "ratio3", "ratio4", "ratio5"]].to_numpy(),
df[["Count1", "Count2", "Count3", "Count4", "Count5"]].to_numpy(),
),
columns=["init_rate1", "init_rate2", "init_rate3", "init_rate4", "init_rate5"],
)
which gives
init_rate1 init_rate2 init_rate3 init_rate4 init_rate5
0 0.28 0.25 0.33 0.57 0.835
1 0.33 0.13 0.97 0.65 0.760
2 0.54 0.11 0.45 0.95 1.160
3 0.35 0.59 0.34 1.25 1.650
However it does not agree with your calcs for init_rate4
or init_rate5
so some clarification might be needed.
Check the code again. Do you have an error in product3 = coffe and product4 = coffee? Fixed coffe to coffee. As a result, 0.31 should not be.
import pandas as pd
pd.set_option('display.max_rows', None) # print everything rows
pd.set_option('display.max_columns', None) # print everything columns
df = pd.DataFrame(
{
'product1':['straws', 'melon', 'bread'],
'product2':['orange', 'milk', 'melon'],
'product3':['melon', 'book', 'coffee'],
'product4':['chair', 'coffee', 'chair'],
'product5':['bread', 'cake', 'book'],
'time':[1,2,3],
'Count1':[1,2,2],
'Count2':[1,1,3],
'Count3':[1,1,2],
'Count4':[1,1,2],
'Count5':[1,1,2],
'ratio1':[0.28, 0.67, 2.5],
'ratio2':[0.54, 0.25, 1.69],
'ratio3':[0.33, 0.13, 1.9],
'ratio4':[0.35, 0.11, 2.5],
'ratio5':[0.11, 0.59, 1.52],
})
print(df)
product = df[['product1', 'product2', 'product3', 'product4', 'product5']].stack().reset_index()
count = df[['Count1', 'Count2', 'Count3', 'Count4', 'Count5']].stack().reset_index()
ratio = df[['ratio1', 'ratio2', 'ratio3', 'ratio4', 'ratio5']].stack().reset_index()
print(ratio)
arr = pd.unique(product[0])
aaa = [i for i in range(len(arr)) if product[product[0] == arr[i]].count()[0] > 1]
for i in aaa:
prod_ind = product[product[0] == arr[i]].index
val_ratio = ratio.loc[prod_ind[0], 0]
ratio.loc[prod_ind, 0] = val_ratio
print(ratio.pivot_table(index='level_0', columns='level_1', values=[0]))
Output:
level_1 ratio1 ratio2 ratio3 ratio4 ratio5
level_0
0 0.28 0.54 0.33 0.35 0.11
1 0.33 0.25 0.13 0.11 0.59
2 0.11 0.33 0.11 0.35 0.13
To work with data, they need to be turned into one column using stack().reset_index(). Create a list of unique products arr. Further in the list aaa I get indexes of arr, which are more than one.
prod_ind = product[product[0] == arr[i]].index
In a loop, I get indexes of products that are more than one.
val_ratio = ratio.loc[prod_ind[0], 0]
Get the first value of the product.
ratio.loc[prod_ind, 0] = val_ratio
Set this value for all products.
To access the values, explicit loc indexing is used, where the row indices are in square brackets on the left, and the names of the columns on the right. Read more here.
In pivot_table I create back the table.
To insert the processed data into the original dataframe, simply use the following:
table = ratio.pivot_table(index='level_0', columns='level_1', values=[0])
df[['ratio1', 'ratio2', 'ratio3', 'ratio4', 'ratio5']] = table
print(df)
index
0
1
2
3
4
0
0.0625
0.034482758620689655
0.03125
0.027777777777777776
0.024390243902439025
1
0.2857142857142857
0.15384615384615385
0.05128205128205128
0.0425531914893617
0.04
2
0.21428571428571427
0.16666666666666666
0.15789473684210525
0.0967741935483871
0.08108108108108109
This is a sample of columns of ratio from my original df, where columns count == 1, which means the initial ratio.
And that’s what happened when I used your code.
variable
k1
k2
k3
k4
k5
1
0.062500
8.827586
8.093750
0.166667
7.439024
2
0.285714
16.461538
8.615385
1.829787
0.040000
3
0.214286
16.888889
12.631579
16.129032
3.567568
It completely changes variables of columns except the first one.
Well thanks for your enormous help as well as @Riley. I’ll try to find another way, maybe pandas is just not good enough for such tasks.
Thanks a lot for your help and time you put on work.
Dataframe:
product1 | product2 | product3 | product4 | product5 |
---|---|---|---|---|
straws | orange | melon | chair | bread |
melon | milk | book | coffee | cake |
bread | melon | coffe | chair | book |
CountProduct1 | CountProduct2 | CountProduct3 | Countproduct4 | Countproduct5 |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 1 | 1 |
2 | 3 | 2 | 2 | 2 |
RatioProduct1 | RatioProduct2 | RatioProduct3 | Ratioproduct4 | Ratioproduct5 |
---|---|---|---|---|
0.28 | 0.54 | 0.33 | 0.35 | 0.11 |
0.67 | 0.25 | 0.13 | 0.11 | 0.59 |
2.5 | 1.69 | 1.9 | 2.5 | 1.52 |
I want to create five others columns that keep my initial ratio of each item along the dataframe.
Output:
InitialRatio1 | InitialRatio2 | InitialRatio3 | InitialRatio4 | InitialRatio5 |
---|---|---|---|---|
0.28 | 0.54 | 0.33 | 0.35 | 0.11 |
0.33 | 0.25 | 0.13 | 0.31 | 0.59 |
0.11 | 0.33 | 0.31 | 0.35 | 0.13 |
If you’re after code to create the init_rateX
columns then the following will work
pd.DataFrame(
np.divide(
df[["ratio1", "ratio2", "ratio3", "ratio4", "ratio5"]].to_numpy(),
df[["Count1", "Count2", "Count3", "Count4", "Count5"]].to_numpy(),
),
columns=["init_rate1", "init_rate2", "init_rate3", "init_rate4", "init_rate5"],
)
which gives
init_rate1 init_rate2 init_rate3 init_rate4 init_rate5
0 0.28 0.25 0.33 0.57 0.835
1 0.33 0.13 0.97 0.65 0.760
2 0.54 0.11 0.45 0.95 1.160
3 0.35 0.59 0.34 1.25 1.650
However it does not agree with your calcs for init_rate4
or init_rate5
so some clarification might be needed.
Check the code again. Do you have an error in product3 = coffe and product4 = coffee? Fixed coffe to coffee. As a result, 0.31 should not be.
import pandas as pd
pd.set_option('display.max_rows', None) # print everything rows
pd.set_option('display.max_columns', None) # print everything columns
df = pd.DataFrame(
{
'product1':['straws', 'melon', 'bread'],
'product2':['orange', 'milk', 'melon'],
'product3':['melon', 'book', 'coffee'],
'product4':['chair', 'coffee', 'chair'],
'product5':['bread', 'cake', 'book'],
'time':[1,2,3],
'Count1':[1,2,2],
'Count2':[1,1,3],
'Count3':[1,1,2],
'Count4':[1,1,2],
'Count5':[1,1,2],
'ratio1':[0.28, 0.67, 2.5],
'ratio2':[0.54, 0.25, 1.69],
'ratio3':[0.33, 0.13, 1.9],
'ratio4':[0.35, 0.11, 2.5],
'ratio5':[0.11, 0.59, 1.52],
})
print(df)
product = df[['product1', 'product2', 'product3', 'product4', 'product5']].stack().reset_index()
count = df[['Count1', 'Count2', 'Count3', 'Count4', 'Count5']].stack().reset_index()
ratio = df[['ratio1', 'ratio2', 'ratio3', 'ratio4', 'ratio5']].stack().reset_index()
print(ratio)
arr = pd.unique(product[0])
aaa = [i for i in range(len(arr)) if product[product[0] == arr[i]].count()[0] > 1]
for i in aaa:
prod_ind = product[product[0] == arr[i]].index
val_ratio = ratio.loc[prod_ind[0], 0]
ratio.loc[prod_ind, 0] = val_ratio
print(ratio.pivot_table(index='level_0', columns='level_1', values=[0]))
Output:
level_1 ratio1 ratio2 ratio3 ratio4 ratio5
level_0
0 0.28 0.54 0.33 0.35 0.11
1 0.33 0.25 0.13 0.11 0.59
2 0.11 0.33 0.11 0.35 0.13
To work with data, they need to be turned into one column using stack().reset_index(). Create a list of unique products arr. Further in the list aaa I get indexes of arr, which are more than one.
prod_ind = product[product[0] == arr[i]].index
In a loop, I get indexes of products that are more than one.
val_ratio = ratio.loc[prod_ind[0], 0]
Get the first value of the product.
ratio.loc[prod_ind, 0] = val_ratio
Set this value for all products.
To access the values, explicit loc indexing is used, where the row indices are in square brackets on the left, and the names of the columns on the right. Read more here.
In pivot_table I create back the table.
To insert the processed data into the original dataframe, simply use the following:
table = ratio.pivot_table(index='level_0', columns='level_1', values=[0])
df[['ratio1', 'ratio2', 'ratio3', 'ratio4', 'ratio5']] = table
print(df)
index | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
0 | 0.0625 | 0.034482758620689655 | 0.03125 | 0.027777777777777776 | 0.024390243902439025 |
1 | 0.2857142857142857 | 0.15384615384615385 | 0.05128205128205128 | 0.0425531914893617 | 0.04 |
2 | 0.21428571428571427 | 0.16666666666666666 | 0.15789473684210525 | 0.0967741935483871 | 0.08108108108108109 |
This is a sample of columns of ratio from my original df, where columns count == 1, which means the initial ratio.
And that’s what happened when I used your code.
variable | k1 | k2 | k3 | k4 | k5 |
---|---|---|---|---|---|
1 | 0.062500 | 8.827586 | 8.093750 | 0.166667 | 7.439024 |
2 | 0.285714 | 16.461538 | 8.615385 | 1.829787 | 0.040000 |
3 | 0.214286 | 16.888889 | 12.631579 | 16.129032 | 3.567568 |
It completely changes variables of columns except the first one.
Well thanks for your enormous help as well as @Riley. I’ll try to find another way, maybe pandas is just not good enough for such tasks.
Thanks a lot for your help and time you put on work.