pandas select from Dataframe using startswith
Question:
This works (using Pandas 12 dev)
table2=table[table['SUBDIVISION'] =='INVERNESS']
Then I realized I needed to select the field using “starts with” Since I was missing a bunch.
So per the Pandas doc as near as I could follow I tried
criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]
And got AttributeError: ‘float’ object has no attribute ‘startswith’
So I tried an alternate syntax with the same result
table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]
Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
Section 4: List comprehensions and map method of Series can also be used to produce more complex criteria:
What am I missing?
Answers:
You can use the str.startswith
DataFrame method to give more consistent results:
In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])
In [12]: s
Out[12]:
0 a
1 ab
2 c
3 11
4 NaN
dtype: object
In [13]: s.str.startswith('a', na=False)
Out[13]:
0 True
1 True
2 False
3 False
4 False
dtype: bool
and the boolean indexing will work just fine (I prefer to use loc
, but it works just the same without):
In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0 a
1 ab
dtype: object
.
It looks least one of your elements in the Series/column is a float, which doesn’t have a startswith method hence the AttributeError, the list comprehension should raise the same error…
To retrieve all the rows which startwith required string
dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]
To retrieve all the rows which contains required string
dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]
You can use apply
to easily apply any string matching function to your column elementwise.
table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS'))]
this assuming that your “SUBDIVISION” column is of the correct type (string)
Edit: fixed missing parenthesis
Using startswith for a particular column value
df = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]
This can also be achieved using query
:
table.query('SUBDIVISION.str.startswith("INVERNESS").values')
This works (using Pandas 12 dev)
table2=table[table['SUBDIVISION'] =='INVERNESS']
Then I realized I needed to select the field using “starts with” Since I was missing a bunch.
So per the Pandas doc as near as I could follow I tried
criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]
And got AttributeError: ‘float’ object has no attribute ‘startswith’
So I tried an alternate syntax with the same result
table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]
Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
Section 4: List comprehensions and map method of Series can also be used to produce more complex criteria:
What am I missing?
You can use the str.startswith
DataFrame method to give more consistent results:
In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])
In [12]: s
Out[12]:
0 a
1 ab
2 c
3 11
4 NaN
dtype: object
In [13]: s.str.startswith('a', na=False)
Out[13]:
0 True
1 True
2 False
3 False
4 False
dtype: bool
and the boolean indexing will work just fine (I prefer to use loc
, but it works just the same without):
In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0 a
1 ab
dtype: object
.
It looks least one of your elements in the Series/column is a float, which doesn’t have a startswith method hence the AttributeError, the list comprehension should raise the same error…
To retrieve all the rows which startwith required string
dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]
To retrieve all the rows which contains required string
dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]
You can use apply
to easily apply any string matching function to your column elementwise.
table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS'))]
this assuming that your “SUBDIVISION” column is of the correct type (string)
Edit: fixed missing parenthesis
Using startswith for a particular column value
df = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]
This can also be achieved using query
:
table.query('SUBDIVISION.str.startswith("INVERNESS").values')