How to insert character ('-") every time my string changes from text to number and vice versa?
Question:
This is an example of a bigger dataframe. Imagine I have a dataframe like this:
import pandas as pd
df = pd.DataFrame({"ID":["4SSS50FX","2TT1897FA"],
"VALUE":[13, 56]})
df
Out[2]:
ID VALUE
0 4SSS50FX 13
1 2TT1897FA 56
I would like to insert "-" in the strings from df["ID"] everytime it changes from number to text and from text to number. So the output should be like:
ID VALUE
0 4-SSS-50-FX 13
1 2-TT-1897-FA 56
I could create specific conditions for each case, but I would like to automate it for all the samples. Anyone could help me?
Answers:
You can use a regular expression with lookarounds.
df['ID'] = df['ID'].str.replace(r'(?<=d)(?=[A-Z])|(?<=[A-Z])(?=d)', '-')
The regexp matches an empty string that’s either preceded by a digit and followed by a letter, or vice versa. This empty string is then replaced with -
.
Use a regex.
>>> df['ID'].str.replace('(d+(?=D)|D+(?=d))', r'1-', regex=True)
0 4-SSS-50-FX
1 2-TT-1897-FA
Name: ID, dtype: object
d+(?=D)
means digits followed by non-digit.
D+(?=d))
means non-digits followed by digit.
Either of those are replaced with themselves plus a -
character.
This is an example of a bigger dataframe. Imagine I have a dataframe like this:
import pandas as pd
df = pd.DataFrame({"ID":["4SSS50FX","2TT1897FA"],
"VALUE":[13, 56]})
df
Out[2]:
ID VALUE
0 4SSS50FX 13
1 2TT1897FA 56
I would like to insert "-" in the strings from df["ID"] everytime it changes from number to text and from text to number. So the output should be like:
ID VALUE
0 4-SSS-50-FX 13
1 2-TT-1897-FA 56
I could create specific conditions for each case, but I would like to automate it for all the samples. Anyone could help me?
You can use a regular expression with lookarounds.
df['ID'] = df['ID'].str.replace(r'(?<=d)(?=[A-Z])|(?<=[A-Z])(?=d)', '-')
The regexp matches an empty string that’s either preceded by a digit and followed by a letter, or vice versa. This empty string is then replaced with -
.
Use a regex.
>>> df['ID'].str.replace('(d+(?=D)|D+(?=d))', r'1-', regex=True)
0 4-SSS-50-FX
1 2-TT-1897-FA
Name: ID, dtype: object
d+(?=D)
means digits followed by non-digit.
D+(?=d))
means non-digits followed by digit.
Either of those are replaced with themselves plus a -
character.