How to split an array using its minimum entry

Question:

I am trying to split a dataset into two separate ones by finding its minimum point in the first column. I have used idxmin to firstly identify the location of the minimum entry and secondly iloc to slice the array from 0 to the minimum point.

The error I encounter is:

TypeError: cannot do positional indexing on RangeIndex with these indexers [1 96

dtype: int64] of type Series

An example dataset is as shown:

   x             y
0    1.000000  6
1    1.000000  2
2    0.999999  5
3    0.999996  3
4    0.999986  4
..        ...           ...
196  0.999987  3
197  0.999996  3
198  0.999999  2
199  1.000000  1
200  1.000000  4

The x column starts from 1 and decreases to a minimum point near zero, where it increases back to 1. I am looking for the smallest x and its corresponding y point to separate the two.

This is the current code I have written:

data = pd.DataFrame(data)

minimum = pd.DataFrame.idxmin(data)

lower_surface = data.iloc[:minimum]

I understand that the variable minimum will return a location in the DataFrame, and hence I thought I could use iloc to separate the array from the beginning to the minimum point but this is not the case.

Asked By: John278

||

Answers:

You should pick one column as reference. Using the whole DataFrame, you will get an index for each column, which cannot be used to slice:

data.idxmin()

x      4
y    199
dtype: int64

You should instead run:

minimum = data['x'].idxmin()

Also, technically you have to use loc to slice, not iloc since idxmax return an indice not a position.

data.loc[:minimum]

Output:

          x  y
0  1.000000  6
1  1.000000  2
2  0.999999  5
3  0.999996  3
4  0.999986  4

If you want to slice with iloc you have to use numpy.argmin:

import numpy as np

data.iloc[:np.argmin(data['x'])]

The output is however slightly different since iloc excludes the end of the slice:

          x  y
0  1.000000  6
1  1.000000  2
2  0.999999  5
3  0.999996  3
Answered By: mozway