Apply str title to df columns values from dictionary values

Question:

I have a dictionary that maps column names to a function name. I have wrote a function that should capitalize the values in the df column with str.title()

import pandas as pd
 
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

  Communication_Language__c firstName lastName state        country company email       industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0                   English      john    smith  ohio  united states                manufacturing       National  Residental
def capitalize (column,df_temp):
    if df_temp[column].notna():
        df_temp[column]=df[column].str.title()
    return df_temp

def required ():
    #somethin
    Pass

parsing_map={
"firstName":[capitalize,required],
"lastName":capitalize,
"state":capitalize,
"country": [capitalize,required],
"industry":capitalize,
"System_Type__c":capitalize,
"AccountType":capitalize,
"customerSegment":capitalize,
}

i wrote the below to achieve the str title but is there a way to apply it to the df columns without naming them all

def capitalize (column,df_temp):
    if df_temp[column].notna():
        df_temp[column]=df[column].str.title()
    return df_temp

What would be the best way to reference the dictionary function mapping to apply str.title() to all of the contents in the columns with a function "capitalize"?

desired output

data= [["English","John","Smith","Ohio","United States","","","Manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

  Communication_Language__c firstName lastName state        country company email       industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0                   English      John    Smith  Ohio  United States                Manufacturing       National  Residental
Asked By: Test Code

||

Answers:

Suggestion: Create a list of columns you want to include and then use apply

cols = ['firstName', 'lastName', 'state', 'country', 'industry', 'System_Type__c', 'AccountType', 'customerSegment']
df.apply(lambda col: col.replace(np.NaN, "").str.title() if col.name in cols else col)

EDIT: Yes, but put a string instead of a reference to your function in your parsing_map

parsing_map = {
    "firstName": "capitalize",
    "lastName": "capitalize",
    "state": "capitalize",
    "country": "capitalize",
    "industry": "capitalize",
    "System_Type__c": "capitalize",
    "AccountType": "capitalize",
    "customerSegment": "capitalize",
}

df.apply(lambda col: col.replace(np.NaN, "").str.title() if parsing_map.get(col.name) == "capitalize" else col)

If you use a dict with lists as values

df.apply(lambda col: col.replace(np.NaN, "").str.title() if "capitalize" in parsing_map.get(col.name) else col)
Answered By: bitflip
def capitalize(df):
    for col in df.columns:
        df[col] = df[col].str.title()
    return df
Answered By: bjornelvar

Normally you would use apply for this, e.g.

cols_to_capitalize = list(parsing_map.keys())
df[cols_to_capitalize] = df[cols_to_capitalize].apply(lambda x: x.str.title())

If you want to keep your method dictionary, I would suggest that you write the methods to act on a column, not on the dataframe. Something like this:

data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

def capitalize(col):
    # TODO handle nan values
    # Maybe use any() instead of all()?
    # This code ignores any column that has even a single NaN value
    if col.notna().all():
        return col.str.title()
    return col

def required(col):
    # TODO do stuff
    return col

parsing_map={
    "firstName":[capitalize,required],
    "lastName":[capitalize],
    "state":[capitalize],
    "country": [capitalize,required],
    "industry":[capitalize],
    "System_Type__c":[capitalize],
    "AccountType":[capitalize],
    "customerSegment":[capitalize],
}


for col_name, fns in parsing_map.items():
    for fn in fns:
        df[col_name] = fn(df[col_name])

You could also pass in the full df into these methods if they need to access other columns, but still returning only the single column would make the design clearer.

But you should think carefully whether you really need to reinvent the .apply functionality.

Answered By: w-m
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.