Querying a dataframe to return data where a column contains specific letters
Question:
Say I have a dataframe df
which looks like the below:
I can query the dataframe using pandas .query
to return specific rows as below:
df.query('Data == "05h"')
Is it possible to amend the above to return rows without hard coding the 0
before 5h
, so it essentially returns all rows which contains 5h
in the string?
Any help appreciated.
Answers:
You can search for the substring you want as follows
df2 = df[df['Data'].str.contains('5h')]
Here is my proposition
import pandas as pd
df = pd.DataFrame({'Date': ['2023-01-01', '2023-02-01', '2023-02-01', '2023-02-01'],
'data': ['05h', '05f', '05h', '05f']})
df = df.loc[(df['data'].str.contains('5h'))]
Yes, you can do it this way:
df.loc[df.Data.str.contains('5h')]
Also, you could get the same result as your query using:
df.loc[df.Data == '05h']
this way you don’t have to hardcode your column name.
Say I have a dataframe df
which looks like the below:
I can query the dataframe using pandas .query
to return specific rows as below:
df.query('Data == "05h"')
Is it possible to amend the above to return rows without hard coding the 0
before 5h
, so it essentially returns all rows which contains 5h
in the string?
Any help appreciated.
You can search for the substring you want as follows
df2 = df[df['Data'].str.contains('5h')]
Here is my proposition
import pandas as pd
df = pd.DataFrame({'Date': ['2023-01-01', '2023-02-01', '2023-02-01', '2023-02-01'],
'data': ['05h', '05f', '05h', '05f']})
df = df.loc[(df['data'].str.contains('5h'))]
Yes, you can do it this way:
df.loc[df.Data.str.contains('5h')]
Also, you could get the same result as your query using:
df.loc[df.Data == '05h']
this way you don’t have to hardcode your column name.