Formatting a specific row of integers to the ssn style

Question:

I want to format a specific column of integers to ssn format (xxx-xx-xxxx). I saw that openpyxl has builtin styles. I have been using pandas and wasn’t sure if it could do this specific format.

I did see this –

df.iloc[:,:].str.replace(',', '')

but I want to replace the ‘,’ with ‘-‘.

import pandas as pd 






df = pd.read_excel('C:/Python/Python37/Files/Original.xls')


df.drop(['StartDate', 'EndDate','EmployeeID'], axis = 1, inplace=True)


df.rename(columns={'CheckNumber': 'W/E Date', 'CheckBranch': 'Branch','DeductionAmount':'Amount'},inplace=True)


df = df[['Branch','Deduction','CheckDate','W/E Date','SSN','LastName','FirstName','Amount','Agency','CaseNumber']]


ssn = (df['SSN'] # the integer column
       .astype(str)       # cast integers to string
       .str.zfill(8)      # zero-padding
       .pipe(lambda s: s.str[:2] + '-' + s.str[2:4] + '-' + s.str[4:]))

writer = pd.ExcelWriter('C:/Python/Python37/Files/Deductions Report.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
Asked By: Roberto Gonzalez

||

Answers:

Your question is a bit confusing, see if this helps:

If you have a column of integers and you want to create a new one made up of strings in SSN (Social Security Number) format. You can try something like:

df['SSN'] = (df['SSN']     # the "integer" column
             .astype(int)  # the integer column
             .astype(str)  # cast integers to string
             .str.zfill(9) # zero-padding
             .pipe(lambda s: s.str[:3] + '-' + s.str[3:5] + '-' + s.str[5:]))
Answered By: Daniel Severo

Setup

Social Security numbers are nine-digit numbers using the form: AAA-GG-SSSS

s = pd.Series([111223333, 222334444])

0    111223333
1    222334444
dtype: int64

Option 1
Using zip and numpy.unravel_index:

pd.Series([
    '{}-{}-{}'.format(*el)
    for el in zip(*np.unravel_index(s, (1000,100,10000)))
])

Option 2
Using f-strings:

pd.Series([f'{i[:3]}-{i[3:5]}-{i[5:]}' for i in s.astype(str)])

Both produce:

0    111-22-3333
1    222-33-4444
dtype: object
Answered By: user3483203

I prefer:

df["ssn"] = df["ssn"].astype(str)
df["ssn"] = df["ssn"].str.strip()
df["ssn"] = (
    df.ssn.str.replace("(", "")
    .str.replace(")", "")
    .str.replace("-", "")
    .str.replace(" ", "")
    .apply(lambda x: f"{x[:3]}-{x[3:5]}-{x[5:]}")
)

This take into account rows that are partially formatted, fully formatted, or not formatted and correctly formats them all.

For Example:

data = [111111111,123456789,"222-11-3333","433-3131234"]

df = pd.DataFrame(data, columns=['ssn'])

Gives you:
Before

After the code you then get:
After

Answered By: Tyler Houssian
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.