How do I convert a Pandas string column into specific numbers without using a loop?
Question:
I have a column of strings that I’d like to convert into specific numbers. My current approach involves using a for loop, but I feel that’s not how Pandas was designed to be used. Could someone suggest a more elegant solution that is applicable to more than one column?
Here is my code –
import pandas as pd
data = [['mechanical@engineer', 'field engineer'], ['field engineer', 'lab_scientist'],
['lab_scientist', 'mechanical@engineer'], ['field engineer', 'mechanical@engineer'],
['lab_scientist','mechanical@engineer']]# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Job1', 'Job2'])
for index, row in df.iterrows():
if row['Job1']=="mechanical@engineer":
row['Job1'] = 0
elif row['Job1']=="field engineer":
row['Job1'] = 1
elif row['Job1'] == "lab_scientist":
row['Job1'] = 2
print(df.head())
Answers:
Looks like you just need a map:
role_to_code = {"mechanical@engineer": 0, "field engineer": 1, "lab_scientist": 2}
df.Job1.map(role_to_code)
#0 0
#1 1
#2 2
#3 1
#4 2
#Name: Job1, dtype: int64
why dont you use replace
function instead of your for
loop?
mapping = {'mechanical@engineer': 0, 'field engineer': 1, 'lab_scientist': 2}
df = df.replace(mapping)
print(df.head())
output would be:
Job1 Job2
0 0 1
1 1 2
2 2 0
3 1 0
4 2 0
I have a column of strings that I’d like to convert into specific numbers. My current approach involves using a for loop, but I feel that’s not how Pandas was designed to be used. Could someone suggest a more elegant solution that is applicable to more than one column?
Here is my code –
import pandas as pd
data = [['mechanical@engineer', 'field engineer'], ['field engineer', 'lab_scientist'],
['lab_scientist', 'mechanical@engineer'], ['field engineer', 'mechanical@engineer'],
['lab_scientist','mechanical@engineer']]# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Job1', 'Job2'])
for index, row in df.iterrows():
if row['Job1']=="mechanical@engineer":
row['Job1'] = 0
elif row['Job1']=="field engineer":
row['Job1'] = 1
elif row['Job1'] == "lab_scientist":
row['Job1'] = 2
print(df.head())
Looks like you just need a map:
role_to_code = {"mechanical@engineer": 0, "field engineer": 1, "lab_scientist": 2}
df.Job1.map(role_to_code)
#0 0
#1 1
#2 2
#3 1
#4 2
#Name: Job1, dtype: int64
why dont you use replace
function instead of your for
loop?
mapping = {'mechanical@engineer': 0, 'field engineer': 1, 'lab_scientist': 2}
df = df.replace(mapping)
print(df.head())
output would be:
Job1 Job2
0 0 1
1 1 2
2 2 0
3 1 0
4 2 0