Remove double space and replace with a single one in pandas
Question:
I have 2m lines of Uk postcode data but some muppet has used double spaces in some cases and single spaces in others. I need to merge data based on the postcode so it needs to be consistent.
I can’t find a simple way to do this in pandas, but it feels like there should be. Any advice?
Answers:
You might be looking for pd.Series.str.replace
:
df.postcode = df.postcode.str.replace(' ', ' ')
this should replace all multiple spaces with a single space
df.postcode = df.postcode.str.replace(' +', ' ')
remove all spaces from the start and end
df.postcode = df.postcode.str.strip()
This should replace any kind of spces,tabs,..etc to one space:
df.postcode = df.postcode.str.replace('s+', ' ')
I have 2m lines of Uk postcode data but some muppet has used double spaces in some cases and single spaces in others. I need to merge data based on the postcode so it needs to be consistent.
I can’t find a simple way to do this in pandas, but it feels like there should be. Any advice?
You might be looking for pd.Series.str.replace
:
df.postcode = df.postcode.str.replace(' ', ' ')
this should replace all multiple spaces with a single space
df.postcode = df.postcode.str.replace(' +', ' ')
remove all spaces from the start and end
df.postcode = df.postcode.str.strip()
This should replace any kind of spces,tabs,..etc to one space:
df.postcode = df.postcode.str.replace('s+', ' ')