Extracting only a number and letter F from the column of a pandas dataframe
Question:
I have some data as below,
AST_NAME
2F
3F
4F
5F
2-F-C
3-F-A
4-F-C
4-F-D
5-F-E
5-F-F
SwB
6-F-G
SwB
7-F-A
I want to extract the number and letter F only from those values like 2-F-C or 3-F-D.
My desired output is as below.
AST_NAME
2F
3F
4F
5F
2F
3F
4F
4F
5F
5F
SwB
6F
SwB
7F
I tried using lstrip() and rstrip() but couldn’t get a simple solution. I even tried extracting 1st 3 letters and then removing ‘-‘ but I failed to achieve the desired output.
Kindly suggest.
Thank you.
Answers:
You can use a regex and str.replace
:
df['AST_NAME_clean'] = df['AST_NAME'].str.replace(r'^(d+)-?(F).*',
r'12', regex=True)
Output:
AST_NAME AST_NAME_clean
0 2F 2F
1 3F 3F
2 4F 4F
3 5F 5F
4 2-F-C 2F
5 3-F-A 3F
6 4-F-C 4F
7 4-F-D 4F
8 5-F-E 5F
9 5-F-F 5F
10 SwB SwB
11 6-F-G 6F
12 SwB SwB
13 7-F-A 7F
try this:
tmp = (df.AST_NAME
.str.extract(r'(d+-?[A-Z])', expand=False)
.dropna()
.str.replace('-', ''))
res = tmp.combine_first(df.AST_NAME).to_frame()
print(res)
>>>
AST_NAME
0 2F
1 3F
2 4F
3 5F
4 2F
5 3F
6 4F
7 4F
8 5F
9 5F
10 SwB
11 6F
12 SwB
13 7F
I have some data as below,
AST_NAME |
---|
2F |
3F |
4F |
5F |
2-F-C |
3-F-A |
4-F-C |
4-F-D |
5-F-E |
5-F-F |
SwB |
6-F-G |
SwB |
7-F-A |
I want to extract the number and letter F only from those values like 2-F-C or 3-F-D.
My desired output is as below.
AST_NAME |
---|
2F |
3F |
4F |
5F |
2F |
3F |
4F |
4F |
5F |
5F |
SwB |
6F |
SwB |
7F |
I tried using lstrip() and rstrip() but couldn’t get a simple solution. I even tried extracting 1st 3 letters and then removing ‘-‘ but I failed to achieve the desired output.
Kindly suggest.
Thank you.
You can use a regex and str.replace
:
df['AST_NAME_clean'] = df['AST_NAME'].str.replace(r'^(d+)-?(F).*',
r'12', regex=True)
Output:
AST_NAME AST_NAME_clean
0 2F 2F
1 3F 3F
2 4F 4F
3 5F 5F
4 2-F-C 2F
5 3-F-A 3F
6 4-F-C 4F
7 4-F-D 4F
8 5-F-E 5F
9 5-F-F 5F
10 SwB SwB
11 6-F-G 6F
12 SwB SwB
13 7-F-A 7F
try this:
tmp = (df.AST_NAME
.str.extract(r'(d+-?[A-Z])', expand=False)
.dropna()
.str.replace('-', ''))
res = tmp.combine_first(df.AST_NAME).to_frame()
print(res)
>>>
AST_NAME
0 2F
1 3F
2 4F
3 5F
4 2F
5 3F
6 4F
7 4F
8 5F
9 5F
10 SwB
11 6F
12 SwB
13 7F