Separate text between square brackets as a separate column in python

Question:

I have the following the following columns,

column_1 = ["Northern Rockies, British Columbia [9A87]", "Northwest Territories [2H89]", "Canada [00052A]", "Division No. 1, Newfoundland and Labrador [52A]"]
column_2 = ["aa", "bb", "cc", "dd"]
column_3 = [4, 4.5, 23, 1]

zipped = list(zip(column_1 , column_2, column_3))
df = pd.DataFrame(zipped, columns=['column_1' , 'column_2', 'column_3'])

I want to extract the text between the square brackets from the first column as a separate column. Below is the output I am looking for,

column_1 = ["Northern Rockies, British Columbia", "Northwest Territories", "Canada", "Division No. 1, Newfoundland and Labrador"]
column_2 = ["aa", "bb", "cc", "dd"]
column_3 = [4, 4.5, 23, 1]
column_4 = ["9A87", "2H89", "00052A", "52A"]

zipped = list(zip(column_1 , column_2, column_3, column_4))
df = pd.DataFrame(zipped, columns=['column_1' , 'column_2', 'column_3', 'column_4'])

I am using square bracket here but I think the solution should apply to any type of bracket.

Asked By: Salahuddin

||

Answers:

I would use str.extract and str.repalce here:

df["column_4"] = df["column_1"].str.extract(r'[(.*?)]')
df["column_1"] = df["column_1"].str.replace(r's*[.*?]$', '', regex=True)
Answered By: Tim Biegeleisen

enter image description here

Hello, Salahuddin!
From the column_1, I assumed that square brackets or any kind of brackets come always after the text. eg. Northern Rockies, British Columbia [9A87]

df[['column_1','column_4']] = df['column_1'].str.extract(r'(.*)[[{(](.*)[]})]',flags=re.IGNORECASE)

This will work for any kind of brackets like [], (), {}

Answered By: Bhavesh Rathod
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.