How to insert character ('-") every time my string changes from text to number and vice versa?

Question:

This is an example of a bigger dataframe. Imagine I have a dataframe like this:

import pandas as pd

df = pd.DataFrame({"ID":["4SSS50FX","2TT1897FA"],
                   "VALUE":[13, 56]})

df
Out[2]: 
          ID  VALUE
0   4SSS50FX     13
1  2TT1897FA     56

I would like to insert "-" in the strings from df["ID"] everytime it changes from number to text and from text to number. So the output should be like:

          ID  VALUE
0   4-SSS-50-FX     13
1  2-TT-1897-FA     56

I could create specific conditions for each case, but I would like to automate it for all the samples. Anyone could help me?

Asked By: user026

||

Answers:

You can use a regular expression with lookarounds.

df['ID'] = df['ID'].str.replace(r'(?<=d)(?=[A-Z])|(?<=[A-Z])(?=d)', '-')

The regexp matches an empty string that’s either preceded by a digit and followed by a letter, or vice versa. This empty string is then replaced with -.

Answered By: Barmar

Use a regex.

>>> df['ID'].str.replace('(d+(?=D)|D+(?=d))', r'1-', regex=True)
0     4-SSS-50-FX
1    2-TT-1897-FA
Name: ID, dtype: object

d+(?=D) means digits followed by non-digit.
D+(?=d)) means non-digits followed by digit.

Either of those are replaced with themselves plus a - character.

Answered By: timgeb
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.