Pandas – How to convert an string column into Integer… then convert into String with 10 charact

Question

I’m performing a data analysis where one of the steps is to create a key by combining several fields.

Unfortunally, the number of digits in a given field is not always the same.

Some information

Datatype of my_field is object;
nan values have been replaced by the '-' character.
But, basically, the my_field is numbers (INTEGER) formatted in Text.

Code

import pandas as pd
import numpy as np

data ={'product': ['PA1', 'PA2', 'PA3', 'PA4', 'PA5', 'PA6', 'PA7', 'PA8'],
       'my_field': ['001', '0000000000002', '3', '04', '-', '5', '-', '6']}
df = pd.DataFrame(data)   
df

Raw Data

	product	my_field
0	PA1	001
1	PA2	0000000000002
2	PA3	3
3	PA4	04
4	PA5	–
5	PA6	5
6	PA7	–
7	PA8	6

My Aproach:

df['my_field'] = np.where(df['my_field'] == '-', '-' , df['my_field'].str.zfill(10) )
df

My Output:

	product	my_field
0	PA1	0000000001
1	PA2	0000000000002
2	PA3	0000000003
3	PA4	0000000004
4	PA5	–
5	PA6	0000000005
6	PA7	–
7	PA8	0000000006

Desired Output:

	product	my_field
0	PA1	0000000001
1	PA2	0000000002
2	PA3	0000000003
3	PA4	0000000004
4	PA5	–
5	PA6	0000000005
6	PA7	–
7	PA8	0000000006

The problem: Some outputs get more then 10 char.

Asked By: Andre Nevares

||

Source

Answer 1

An alternative solution using len():

def myfield_format(x):
    if len(x)>10:
        field=str(x)[(len(str(x))-10):] if x!='-' else '-'
    else:
        field=(10-len(str(x)))*'0'+str(x) if x!='-' else '-'
        
    return field

df['my_field']=df['my_field'].map(lambda x: myfield_format(x))

product	my_field
PA1	0000000001
PA2	0000000002
PA3	0000000003
PA4	0000000004
PA5	–
PA6	0000000005
PA7	–
PA8	0000000006

Answered By: maracuja

Answer 2

What about slicing after zfill, this way you’ll keep the last 10 characters only:

df['my_field'] = np.where(df['my_field'] == '-', '-', df['my_field'].str.zfill(10).str[-10:])

Alternative with boolean indexing:

df.loc[df['my_field'] != '-',
       'my_field'] = df['my_field'].str.zfill(10).str[-10:]

Output:

  product    my_field
0     PA1  0000000001
1     PA2  0000000002
2     PA3  0000000003
3     PA4  0000000004
4     PA5           -
5     PA6  0000000005
6     PA7           -
7     PA8  0000000006

Answered By: mozway

Answer 3

df.assign(my_field=df.my_field.map(lambda x:str(int(x)).zfill(10) if x.isdigit() else x))

 product    my_field
0     PA1  0000000001
1     PA2  0000000002
2     PA3  0000000003
3     PA4  0000000004
4     PA5           -
5     PA6  0000000005
6     PA7           -
7     PA8  0000000006

Answered By: G.G