Use regex to extract number before a list of words in pandas dataframe
Question:
I want to extract only the numbers before a list of specific words. Then put the extracted numbers in a new column.
The list of words is: l = ["car", "truck", "van"]
. I only put singular form here, but it should also apply to plural.
df = pd.DataFrame(columns=["description"], data=[["have 3 cars"], ["a 1-car situation"], ["may be 2 trucks"]])
We can call the new column for extracted number df["extracted_num"]
Thank you!
Answers:
You can use Series.str.extract
l = ["car", "truck", "van"]
pat = f"(d+)[s-](?:{'|'.join(l)})"
df['extracted_num'] = df['description'].str.extract(pat)
Output:
>>> print(pat)
(d+)[s-](?:car|truck|van)
>>> df
description extracted_num
0 have 3 cars 3
1 a 1-car situation 1
2 may be 2 trucks 2
Explanation:
(d+)
– Matches one or more digits and captures the group;
[s-]
– Matches a single space or hyphen;
(?:{'|'.join(l)})"
– Matches any word from the list l
without capturing it.
I want to extract only the numbers before a list of specific words. Then put the extracted numbers in a new column.
The list of words is: l = ["car", "truck", "van"]
. I only put singular form here, but it should also apply to plural.
df = pd.DataFrame(columns=["description"], data=[["have 3 cars"], ["a 1-car situation"], ["may be 2 trucks"]])
We can call the new column for extracted number df["extracted_num"]
Thank you!
You can use Series.str.extract
l = ["car", "truck", "van"]
pat = f"(d+)[s-](?:{'|'.join(l)})"
df['extracted_num'] = df['description'].str.extract(pat)
Output:
>>> print(pat)
(d+)[s-](?:car|truck|van)
>>> df
description extracted_num
0 have 3 cars 3
1 a 1-car situation 1
2 may be 2 trucks 2
Explanation:
(d+)
– Matches one or more digits and captures the group;[s-]
– Matches a single space or hyphen;(?:{'|'.join(l)})"
– Matches any word from the listl
without capturing it.