Applying Functions in Python

Question:

I am an R User that is trying to learn more about Python.

I found this Python library that I would like to use for address parsing: https://github.com/zehengl/ez-address-parser

I was able to try an example over here:

from ez_address_parser import AddressParser

ap = AddressParser()

result = ap.parse("290 Bremner Blvd, Toronto, ON M5V 3L9")
print(results)
[('290', 'StreetNumber'), ('Bremner', 'StreetName'), ('Blvd', 'StreetType'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('M5V', 'PostalCode'), ('3L9', 'PostalCode')]

I have the following file that I imported:

df = pd.read_csv(r'C:/Users/me/OneDrive/Documents/my_file.csv',  encoding='latin-1')

   name                               address
1 name1 290 Bremner Blvd, Toronto, ON M5V 3L9
2 name2 291 Bremner Blvd, Toronto, ON M5V 3L9
3 name3 292 Bremner Blvd, Toronto, ON M5V 3L9

I tried to apply the above function and export the file:

df['Address_Parse'] = df['ADDRESS'].apply(ap.parse)

df = pd.DataFrame(df)
df.to_csv(r'C:/Users/me/OneDrive/Documents/python_file.csv', index=False, header=True)

This seems to have worked – but everything appears to be in one line!

[('290', 'StreetNumber'), ('Bremner', 'StreetName'), ('Blvd', 'StreetType'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('M5V', 'PostalCode'), ('3L9', 'PostalCode')]

Is there a way in Python to make each of these "elements" (e.g. StreetNumber, StreetName, etc.) into a separate column?

Thank you!

Asked By: antonoyaro8

||

Answers:

Define a custom function that returns a Series and join the output:

def parse(x):
    return pd.Series({k:v for v,k in ap.parse(x)})

out = df.join(df['ADDRESS'].apply(parse))

print(out)
Answered By: mozway

If you use pd.DataFrame.apply, Then you don’t have to remember to change it into a series!

But rather can use axis=1 and result_type='expand'

Given:

# df
   name                                address
0 name1  290 Bremner Blvd, Toronto, ON M5V 3L9

Doing:

def parse_address(row):
    return {k:v for v,k in ap.parse(row.address)}

df = df.join(df.apply(parse_address, axis=1, result_type='expand'))

# OR Something like this would also work:

def parse_address(row):
    return [x[0] for x in ap.parse(row.address)]

new_cols = [
    'StreetNumber', 
    'StreetName',
    'StreetType',
    'Municipality',
    'Province',
    'PostalCode',
    'PostalCode'
]

df[new_cols] = df.apply(parse_address, axis=1, result_type='expand')

Outputs:

# Method 1
    name                                address Municipality PostalCode Province StreetName StreetNumber StreetType
0  name1  290 Bremner Blvd, Toronto, ON M5V 3L9      Toronto        3L9       ON    Bremner          290       Blvd


# Method 2
    name                                address StreetNumber StreetName StreetType Municipality Province PostalCode
0  name1  290 Bremner Blvd, Toronto, ON M5V 3L9          290    Bremner       Blvd      Toronto       ON        3L9

As for dictionary comprehension:

# This:
out = {k:v for v,k in [('a', 'b')]}

# Is like writing this:

out = {}
for v, k in [('a', 'b')]:
    out[k] = v

# Both result in:
{'b': 'a'}
Answered By: BeRT2me
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.