Pandas – How to convert an string column into Integer… then convert into String with 10 charact

Question:

I’m performing a data analysis where one of the steps is to create a key by combining several fields.

Unfortunally, the number of digits in a given field is not always the same.

Some information
  • Datatype of my_field is object;
  • nan values have been replaced by the '-' character.
  • But, basically, the my_field is numbers (INTEGER) formatted in Text.
Code
import pandas as pd
import numpy as np

data ={'product': ['PA1', 'PA2', 'PA3', 'PA4', 'PA5', 'PA6', 'PA7', 'PA8'],
       'my_field': ['001', '0000000000002', '3', '04', '-', '5', '-', '6']}
df = pd.DataFrame(data)   
df
Raw Data
product my_field
0 PA1 001
1 PA2 0000000000002
2 PA3 3
3 PA4 04
4 PA5
5 PA6 5
6 PA7
7 PA8 6
My Aproach:
df['my_field'] = np.where(df['my_field'] == '-', '-' , df['my_field'].str.zfill(10) )
df
My Output:
product my_field
0 PA1 0000000001
1 PA2 0000000000002
2 PA3 0000000003
3 PA4 0000000004
4 PA5
5 PA6 0000000005
6 PA7
7 PA8 0000000006
Desired Output:
product my_field
0 PA1 0000000001
1 PA2 0000000002
2 PA3 0000000003
3 PA4 0000000004
4 PA5
5 PA6 0000000005
6 PA7
7 PA8 0000000006

The problem: Some outputs get more then 10 char.

Asked By: Andre Nevares

||

Answers:

An alternative solution using len():

def myfield_format(x):
    if len(x)>10:
        field=str(x)[(len(str(x))-10):] if x!='-' else '-'
    else:
        field=(10-len(str(x)))*'0'+str(x) if x!='-' else '-'
        
    return field

df['my_field']=df['my_field'].map(lambda x: myfield_format(x))
product my_field
PA1 0000000001
PA2 0000000002
PA3 0000000003
PA4 0000000004
PA5
PA6 0000000005
PA7
PA8 0000000006
Answered By: maracuja

What about slicing after zfill, this way you’ll keep the last 10 characters only:

df['my_field'] = np.where(df['my_field'] == '-', '-', df['my_field'].str.zfill(10).str[-10:])

Alternative with boolean indexing:

df.loc[df['my_field'] != '-',
       'my_field'] = df['my_field'].str.zfill(10).str[-10:]

Output:

  product    my_field
0     PA1  0000000001
1     PA2  0000000002
2     PA3  0000000003
3     PA4  0000000004
4     PA5           -
5     PA6  0000000005
6     PA7           -
7     PA8  0000000006
Answered By: mozway
df.assign(my_field=df.my_field.map(lambda x:str(int(x)).zfill(10) if x.isdigit() else x))

 product    my_field
0     PA1  0000000001
1     PA2  0000000002
2     PA3  0000000003
3     PA4  0000000004
4     PA5           -
5     PA6  0000000005
6     PA7           -
7     PA8  0000000006
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.