Python: replace a pattern in a df column with another pattern
Question:
I have a dataframe as per the below:
import pandas as pd
df = pd.DataFrame(
columns=['deal','details'],
data=[
['deal1', 'MH92h'],
['deal2', 'L97h'],
['deal3', '97.538'],
['deal4', 'LM98h'],
['deal5', 'TRD (97.612 cvr)'],
]
)
I would like to replace the any row that has details
= MH[0-9]h
with [0-9].75
For example, the output would look as follows:
df =
['deal1', 'MH92h', '92.75']
['deal2', 'L97h', 'L97h'],
['deal3', '97.538', '97.538'],
['deal4', 'MH98h', '98.75'],
['deal5', 'TRD 97.61', 'TRD 97.61']
I’ve tried the below, but it doesn’t work:
df = df.assign(test_col=df.details.str.replace("d+",r'd+'+'75'), regex=True)
Answers:
You could match MH([0-9]+)h
and replace with capture group 1.
See the capture group 1 at this regex demo.
Note that deal4 has LM98h
and not MH98h
import pandas as pd
df = pd.DataFrame(
columns=['deal','details'],
data=[
['deal1', 'MH92h'],
['deal2', 'L97h'],
['deal3', '97.538'],
['deal4', 'LM98h'],
['deal5', 'TRD (97.612 cvr)'],
]
)
df = df.assign(test_col=df.details.str.replace(r"MH([0-9]+)h", "g<1>.75"))
print(df)
print(df)
Output
deal details test_col
0 deal1 MH92h 92.75
1 deal2 L97h L97h
2 deal3 97.538 97.538
3 deal4 LM98h LM98h
4 deal5 TRD (97.612 cvr) TRD (97.612 cvr)
I have a dataframe as per the below:
import pandas as pd
df = pd.DataFrame(
columns=['deal','details'],
data=[
['deal1', 'MH92h'],
['deal2', 'L97h'],
['deal3', '97.538'],
['deal4', 'LM98h'],
['deal5', 'TRD (97.612 cvr)'],
]
)
I would like to replace the any row that has details
= MH[0-9]h
with [0-9].75
For example, the output would look as follows:
df =
['deal1', 'MH92h', '92.75']
['deal2', 'L97h', 'L97h'],
['deal3', '97.538', '97.538'],
['deal4', 'MH98h', '98.75'],
['deal5', 'TRD 97.61', 'TRD 97.61']
I’ve tried the below, but it doesn’t work:
df = df.assign(test_col=df.details.str.replace("d+",r'd+'+'75'), regex=True)
You could match MH([0-9]+)h
and replace with capture group 1.
See the capture group 1 at this regex demo.
Note that deal4 has LM98h
and not MH98h
import pandas as pd
df = pd.DataFrame(
columns=['deal','details'],
data=[
['deal1', 'MH92h'],
['deal2', 'L97h'],
['deal3', '97.538'],
['deal4', 'LM98h'],
['deal5', 'TRD (97.612 cvr)'],
]
)
df = df.assign(test_col=df.details.str.replace(r"MH([0-9]+)h", "g<1>.75"))
print(df)
print(df)
Output
deal details test_col
0 deal1 MH92h 92.75
1 deal2 L97h L97h
2 deal3 97.538 97.538
3 deal4 LM98h LM98h
4 deal5 TRD (97.612 cvr) TRD (97.612 cvr)