How to drop all the rows between two delimiters

Question:

Please find below my input/output :

Input :

Col1
Green
Purple
Start delimiter
abchd
oeitms
End delimiter
Yellow
Start delimiter
kdkfkd
dldldldl
mdmdmdm
End delimiter
Red
Brown
Rose
White

Output (desired) :

Col1
Green
Purple
Yellow
Red
Brown
Rose
White

Basically, I’m trying to drop all the rows between Start delimiter and End delimiter (including their rows too).
The problem is the distance between these two is not fix !

I tried the following code but without any success :

m1 = df['Col1'] == 'Start delimiter'
m2 = df['Col1'] == 'End delimiter'

df.loc[~m1&~m2]

Do you have any suggestions, please?

Asked By: Stormy

||

Answers:

You could try:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {"col1":["red","blue","Start delimiter","black", "white", "End delimiter", "green", "blue",
             "Start delimiter", "red", "green", "anothercolor", "End delimiter", "again_anothercolor"]}
)

idx_start = df.index[df.col1 == "Start delimiter"]  
idx_end = df.index[df.col1 == "End delimiter"]

rows2drop = []

for start, end in zip(idx_start, idx_end):
    rows2drop.extend(np.arange(start, end+1))
    
df.drop(rows2drop, axis=0)

Initial dataframe

+----+--------------------+
|    | col1               |
|----+--------------------|
|  0 | red                |
|  1 | blue               |
|  2 | Start delimiter    |
|  3 | black              |
|  4 | white              |
|  5 | End delimiter      |
|  6 | green              |
|  7 | blue               |
|  8 | Start delimiter    |
|  9 | red                |
| 10 | green              |
| 11 | anothercolor       |
| 12 | End delimiter      |
| 13 | again_anothercolor |
+----+--------------------+

Dataframe after drop()

+----+--------------------+
|    | col1               |
|----+--------------------|
|  0 | red                |
|  1 | blue               |
|  6 | green              |
|  7 | blue               |
| 13 | again_anothercolor |
+----+--------------------+
Answered By: Marco_CH

try:

df
    Col1
0   Green
1   Purple
2   Start delimiter
3   abchd
4   oeitms
5   End delimiter
6   Yellow
7   Start delimiter
8   kdkfkd
9   dldldldl
10  mdmdmdm
11  End delimiter
12  Red
13  Brown
14  Rose
15  White

a = False
b = []
for i in df['Col1'].tolist():
    print(i)
    if i == 'Start delimiter':
        a = True
        b.append(i)
    elif i == 'End delimiter':
        a = False 
        b.append(i)
    elif a:
        b.append(i)

#b --> ['Start delimiter', 'abchd', 'oeitms', 'End delimiter', 'Start delimiter', 'kdkfkd', 'dldldldl', 'mdmdmdm', 'End delimiter']
df = df[~df['Col1'].isin(b)]

df
    Col1
0   Green
1   Purple
6   Yellow
12  Red
13  Brown
14  Rose
15  White
Answered By: khaled koubaa

For a vectorial approach you can use:

d = {'Start delimiter': False,
     'End delimiter': True}

# assign False after Start
# True after End
# True otherwise
m1 = df['Col1'].map(d).ffill().fillna(True)
# ensure not an End
m2 = df['Col1'].ne('End delimiter')

# if both conditions are met, keep the rows
out = df.loc[m1&m2]

Output:

      Col1
0    Green
1   Purple
6   Yellow
12     Red
13   Brown
14    Rose
15   White
Answered By: mozway

Here’s a function to do just that.

def cleanlist(list_to_clean,start_clean,end_clean) :
    clean_list =[]
    clean_status = False
    for x in list_to_clean:
        if x == start_clean :
            clean_status=True
            
        elif clean_status == False:
            clean_list.append(x)
            
        elif  x == end_clean :
            clean_status = False
            
    return clean_list

print(cleanlist(list,'Start delimiter','End delimiter'))
Answered By: Omar T.

One option is to build the list of indices that are not within the starts and ends, and index df with that. This assumes that for every Start delimiter, there is a corresponding End delimiter:

start = df.Col1.eq('Start delimiter').to_numpy().nonzero()[0]
end = df.Col1.eq('End delimiter').to_numpy().nonzero()[0] + 1
combo = [np.arange(s, e) for s, e in zip(start, end)]
combo = np.concatenate(combo)
keep = df.index.difference(combo)
df.loc[keep]

      Col1
0    Green
1   Purple
6   Yellow
12     Red
13   Brown
14    Rose
15   White
Answered By: sammywemmy

You can also use mod()

s = 'Start delimiter'
e = 'End delimiter'

df.loc[~df['Col1'].isin([s,e]).cumsum().mod(2) & df['Col1'].ne(e)]

      Col1
0    Green
1   Purple
6   Yellow
12     Red
13   Brown
14    Rose
15   White
Answered By: rhug123
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.