Delete rows from pandas dataframe by using regex

Question:

This is my dataframe:

df = pd.DataFrame(
    {
        'a': [
            '#x{LA 0.098:abc}',
            '#x{LA abc:0.31}',
            '#x{BC abc:0.1231}',
            '#x{LA 0.333:abc}',
            '#x{CN 0.031:abc}',
            '#x{YM abc:12345}',
            '#x{YM 1222:abc}',
        ]
    }
)

I have two list of ids that are needed in order to delete rows based on the postion of "abc" from the colon. That is whether abc is on the right side of colon or left side.
These are my lists:

labels_that_abc_is_right = ['LA', 'CN']
labels_that_abc_is_left = ['YM', 'BC']

For example I want to omit rows that contain LA and abc is on the right side of colon. The same applies for CN. I want to delete rows that contain YM and abc is on the left side of colon. This is just a sample. I have hundreds of Ids.
This is the output that I want after deleting rows:

                 a
1    #x{LA abc:0.31}
6    #x{YM 1222:abc}

I have tried the solutions of these two answers: answer1 and answer2. And I know that I probably need to use df.a.str.contains with a regex. But it still doesn’t work

Asked By: Amir

||

Answers:

Form your criteria into regex first, then do the data filtering:

regex_right = r'b(LA|CN)b.+b:abcb'
regex_left = r'b(YM|BC)b.+babc:b'
df[~(df['a'].str.contains(regex_right, regex=True) | df['a'].str.contains(regex_left, regex=True))]
Answered By: Hanwei Tang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.