In Dataframe, remove parentheses and dash from phone number and also take care about international prefix

Question:

In data frame, how to remove unnecessary thing from Contact number

df

Id Phone
1  (+1)123-456-7890
2  (123)-(456)-(7890)
3  123-456-7890

Final Output

Id  Phone
1   1234567890
2   1234567890
3   1234567890
Asked By: parth hirpara

||

Answers:

I would use a regex with str.replace here:

df['Phone2'] = df['Phone'].str.replace(r'^(?:(+d+))|D', '', regex=True)

output:

   Id               Phone      Phone2
0   1    (+1)123-456-7890  1234567890
1   2  (123)-(456)-(7890)  1234567890
2   3        123-456-7890  1234567890

regex:

^(?:(+d+)) # match a (+0) leading identifier
|              # OR
D             # match a non-digit

regex demo

notes on the international prefix:

This might be important to keep.

Keep the prefixes:

df['Phone2'] = df['Phone'].str.replace(r'[^+d]', '', regex=True)

output:

   Id               Phone          Phone2
0   1    (+1)123-456-7890    +11234567890
1   2  (123)-(456)-(7890)      1234567890
2   3        123-456-7890      1234567890
3   4  (+380)123-456-7890  +3801234567890

Only drop a specific prefix (here +1):

df['Phone2'] = df['Phone'].str.replace(r'^(?:(+1))|[^+d]', '', regex=True)
# or, more flexible
df['Phone2'] = df['Phone'].str.replace(r'(?:+1D)|[^+d]', '', regex=True)

output:

   Id               Phone          Phone2
0   1    (+1)123-456-7890      1234567890
1   2  (123)-(456)-(7890)      1234567890
2   3        123-456-7890      1234567890
3   4  (+380)123-456-7890  +3801234567890
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.