How to find a row content from a Pandas df into another df and then add the count of the findings into a new column of the first df?
Question:
I just started using Python and I am trying to improve myself by engaging in different projects.
At the moment I am trying to create a work allocation tool. I loaded 2 xlsx files as df.
In df "dfau" there is a list of the employees with their capacity left and the languages that they speak.
Employees
Capacity Left
Languages
E1
0
EN, GER
E2
-11
EN, IT, ES
E3
6
EN, CH
E4
2
EN, ES, FR
E5
-1
EN
E6
1
EN
E7
-9
EN, GER
E8
8
EN, GER
In df "dfln" there is a list of the languages.
Languages
EN
IT
ES
GER
CH
FR
NL
PT
I would like to look for the languages listed in "dfln" into "dfau" and add the count of these languages as a second column in "dfln" (and this is where I am stuck).
Afterwards, in "dfln", I would also like to add a column which would show the capacity per each language by summing the capacities left of each employee which can speak the language.
The employees which have a "Capacity Left" below 1 should not be counted nor summed.
This is how it should appear:
Languages
Count
Capacity
EN
6
85
IT
0
0
ES
3
15
GER
1
8
CH
1
6
FR
1
2
NL
1
45
PT
2
13
I tried to re-write the code in different ways but still this is where I got stuck (the result remains the basic dfln.
import pandas as pd
dfau = pd.read_excel (r'C:UsersProjectspywpSampleAu.xlsx', sheet_name='Employees')
dfln = pd.read_excel (r'C:UsersProjectspywpSampleAu.xlsx', sheet_name='Languages')
LCount=0
dfln.insert(1,"Count",LCount)
for language in dfln["Languages"]:
if dfau.loc[dfau["Languages"].str.contains(language, case=False)] is True:
LCount+=1
print(dfln)
What can I try next?
Answers:
I cannot make sense of the final output. However, you can achieve what you are asking by running the following code:
dfIn['Count'] = 0
dfIn['Capacity'] = 0
for idx,row in dfIn.iterrows():
dfIn.iloc[idx,1] = dfau['Languages'][dfau['Capacity Left'] >= 1].str.contains(row.Languages).sum()
dfIn.iloc[idx, 2] = dfau['Capacity Left'][dfau['Capacity Left'] >= 1][dfau['Languages'].str.contains(row.Languages)].sum()
Notice that the final output looks like this, and not what you posted above:
I just started using Python and I am trying to improve myself by engaging in different projects.
At the moment I am trying to create a work allocation tool. I loaded 2 xlsx files as df.
In df "dfau" there is a list of the employees with their capacity left and the languages that they speak.
Employees | Capacity Left | Languages |
---|---|---|
E1 | 0 | EN, GER |
E2 | -11 | EN, IT, ES |
E3 | 6 | EN, CH |
E4 | 2 | EN, ES, FR |
E5 | -1 | EN |
E6 | 1 | EN |
E7 | -9 | EN, GER |
E8 | 8 | EN, GER |
In df "dfln" there is a list of the languages.
Languages |
---|
EN |
IT |
ES |
GER |
CH |
FR |
NL |
PT |
I would like to look for the languages listed in "dfln" into "dfau" and add the count of these languages as a second column in "dfln" (and this is where I am stuck).
Afterwards, in "dfln", I would also like to add a column which would show the capacity per each language by summing the capacities left of each employee which can speak the language.
The employees which have a "Capacity Left" below 1 should not be counted nor summed.
This is how it should appear:
Languages | Count | Capacity |
---|---|---|
EN | 6 | 85 |
IT | 0 | 0 |
ES | 3 | 15 |
GER | 1 | 8 |
CH | 1 | 6 |
FR | 1 | 2 |
NL | 1 | 45 |
PT | 2 | 13 |
I tried to re-write the code in different ways but still this is where I got stuck (the result remains the basic dfln.
import pandas as pd
dfau = pd.read_excel (r'C:UsersProjectspywpSampleAu.xlsx', sheet_name='Employees')
dfln = pd.read_excel (r'C:UsersProjectspywpSampleAu.xlsx', sheet_name='Languages')
LCount=0
dfln.insert(1,"Count",LCount)
for language in dfln["Languages"]:
if dfau.loc[dfau["Languages"].str.contains(language, case=False)] is True:
LCount+=1
print(dfln)
What can I try next?
I cannot make sense of the final output. However, you can achieve what you are asking by running the following code:
dfIn['Count'] = 0
dfIn['Capacity'] = 0
for idx,row in dfIn.iterrows():
dfIn.iloc[idx,1] = dfau['Languages'][dfau['Capacity Left'] >= 1].str.contains(row.Languages).sum()
dfIn.iloc[idx, 2] = dfau['Capacity Left'][dfau['Capacity Left'] >= 1][dfau['Languages'].str.contains(row.Languages)].sum()
Notice that the final output looks like this, and not what you posted above: