Select multiple sections of rows by index in pandas

Question

I have large DataFrame with GPS path and some attributes. A few sections of the path are those which I need to analyse. I would like to subset only those sections to a new DataFrame. I can subset one section at the time but the idea is to have them all and to have an original index.

The problem is similar to:

import pandas as pd 
df = pd.DataFrame({'A':[0,1,2,3,4,5,6,7,8,9],'B':['a','b','c','d','e','f','g','h','i','j']},
                  index=range(10,20,))

I want o get something like:

cdf = df.loc[[11:13] & [17:20]] # SyntaxError: invalid syntax

desired outcome:

I know the example is easy with cdf = df.loc[[11,12,13,17,18,19],:] but in the original problem I have thousands of lines and some entries already removed, so listing points is rather not an option.

Asked By: tomasz74

||

Source

Answer 1

One possible solution with concat:

cdf = pd.concat([df.loc[11:13], df.loc[17:20]])
print (cdf)
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

Another solution with range:

cdf = df.loc[list(range(11,14)) + list(range(17,20))]
print (cdf)
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

Answered By: jezrael

Answer 2

You could use np.r_ to concatenate the slices:

In [16]: df.loc[np.r_[11:13, 17:20]]
Out[16]: 
    A  B
11  1  b
12  2  c
17  7  h
18  8  i
19  9  j

Note, however, that
df.loc[A:B] selects labels A through B with B included.
np.r_[A:B] returns an array of A through B with B excluded. To include B you would need to use np.r_[A:B+1].

When passed a slice, such as df.loc[A:B], df.loc ignores labels that are not in df.index. In contrast, when passed an array, such as df.loc[np.r_[A:B]], df.loc may add a new row filled with NaNs for each value in the array which is not in df.index.

Thus to produce the desired result, you would need to adjust the right endpoint of the slices and use isin to test for membership in df.index:

In [26]: df.loc[df.index.isin(np.r_[11:14, 17:21])]
Out[26]: 
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

Answered By: unutbu

Answer 3

One option is with pyjanitor select_rows – note that the selection is based on the label, not the integer position:

# pip install pyjanitor
import pandas as pd

df.select_rows(slice(11,13), slice(17,20))
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

Answered By: sammywemmy

Select multiple sections of rows by index in pandas

Question:

Answers: