Search for an exact match of a string in pandas.DataFrame
Question:
I have a DataFrame as follows:
data = [
['2022-12-04 00:00:00', 5000.00],
['2022-12-04 00:00:00', 6799.50],
['2022-12-04 00:00:00', 5000.00],
['2023-01-10 00:00:00', 5000.00]
]
df = pd.DataFrame(data, columns=['Date', 'Float'])
date_input = "2022-12-04 00:00:00"
float_input = "5000.00"
What would be the best way to check if there is a string in DF with an exact match of the values ‘date’ and ‘float’.
In such case, I expect the ‘Yes’ output, since such combination of ‘date’ and ‘float’ is contained in the first line of DataFrame.
I tried it like this, but it doesn’t help to determine if there is a match of ‘float_input’ values for a certain ‘date_input’ date
if ((df['Date'] == pd.Timestamp(date_input)).any()) and (df['Float'] == float(float_input).any():
print('YES')
else:
print("No")
Answers:
If I am understanding your problem correctly, I believe you can check if there is a string in DF with an exact match of the values ‘date’ and ‘float’ by using the loc method of the DataFrame to select the rows that match the ‘date_input’ and ‘float_input’ values.
Then, you can check if the resulting DataFrame is empty or not, which indicates that there is at least one row with the exact match of ‘date_input’ and ‘float_input’ values.
Modified Code
date_input = "2022-12-04 00:00:00"
float_input = "5000.00"
# Use loc to select the rows that match the values
matches = df.loc[(df['Date'] == pd.Timestamp(date_input)) & (df['Float'] == float(float_input))]
# Check if the resulting DataFrame is empty or not
if not matches.empty:
print('YES')
else:
print('NO')
if [date_input,float_input] in df.values:
print(‘yes’)
else:
print(‘no’)
Good night friend, I performed the process you tried. First, let’s break it down:
Step 1:
Your data set has incorrect values, you have to correct the date to a string before placing it in pd.DataFrame.
data = [
['2022-12-04 00:00:00', 5000.00],
['2022-12-04 00:00:00', 6799.50],
['2022-12-04 00:00:00', 5000.00],
['2023-01-10 00:00:00', 5000.00]]
Step 2:
After solving the previous question, we perform the dateframe process:
Now we can visualize the dateframe, right after that we see that the column that one of the columns we are looking for has a type problem, so we must carry out the conversion
Step 3:
Using one of the pandas tools, we can convert the type to datetime, you can see that I used the dayfirst parameter, this is optional, but since we don’t have hours, minutes and seconds. I didn’t choose telos. I followed the documentation which you will find quite interesting.
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
Step 4:
Let’s go to the consultations, in a simple way, performing a consultation by parts, looking for one at a time. We found your values query
SearchDate = df[df['Date'] == pd.Timestamp("2022-12-04 00:00:00'")]
SearchNumber = SearchDate[SearchDate['Float'] == 5000.00]
SearchNumber
Another way to search is for the indexes that return True from each query, using the and operator, we manage to return the values of the query
searchDate = df['Date'] == pd.Timestamp("2022-12-04 00:00:00'")
seachFloat = df['Float'] == 5000.00
query = df[searchDate & seachFloat]
query
Carrying out the table in the previous step in another form of execution
query_dataframe = df[(df['Date'] == pd.Timestamp("2022-12-04 00:00:00")) & (df['Float'] == 5000.00)]
query_dataframe
step 5:
This check you put I didn’t understand very well. In a simple way, I checked if the variable is empty, I performed the return as in your example
if len(query_dataframe):
print('Yes')
else:
print('No')
I have a DataFrame as follows:
data = [
['2022-12-04 00:00:00', 5000.00],
['2022-12-04 00:00:00', 6799.50],
['2022-12-04 00:00:00', 5000.00],
['2023-01-10 00:00:00', 5000.00]
]
df = pd.DataFrame(data, columns=['Date', 'Float'])
date_input = "2022-12-04 00:00:00"
float_input = "5000.00"
What would be the best way to check if there is a string in DF with an exact match of the values ‘date’ and ‘float’.
In such case, I expect the ‘Yes’ output, since such combination of ‘date’ and ‘float’ is contained in the first line of DataFrame.
I tried it like this, but it doesn’t help to determine if there is a match of ‘float_input’ values for a certain ‘date_input’ date
if ((df['Date'] == pd.Timestamp(date_input)).any()) and (df['Float'] == float(float_input).any():
print('YES')
else:
print("No")
If I am understanding your problem correctly, I believe you can check if there is a string in DF with an exact match of the values ‘date’ and ‘float’ by using the loc method of the DataFrame to select the rows that match the ‘date_input’ and ‘float_input’ values.
Then, you can check if the resulting DataFrame is empty or not, which indicates that there is at least one row with the exact match of ‘date_input’ and ‘float_input’ values.
Modified Code
date_input = "2022-12-04 00:00:00"
float_input = "5000.00"
# Use loc to select the rows that match the values
matches = df.loc[(df['Date'] == pd.Timestamp(date_input)) & (df['Float'] == float(float_input))]
# Check if the resulting DataFrame is empty or not
if not matches.empty:
print('YES')
else:
print('NO')
if [date_input,float_input] in df.values:
print(‘yes’)
else:
print(‘no’)
Good night friend, I performed the process you tried. First, let’s break it down:
Step 1:
Your data set has incorrect values, you have to correct the date to a string before placing it in pd.DataFrame.
data = [
['2022-12-04 00:00:00', 5000.00],
['2022-12-04 00:00:00', 6799.50],
['2022-12-04 00:00:00', 5000.00],
['2023-01-10 00:00:00', 5000.00]]
Step 2:
After solving the previous question, we perform the dateframe process:
Now we can visualize the dateframe, right after that we see that the column that one of the columns we are looking for has a type problem, so we must carry out the conversion
Step 3:
Using one of the pandas tools, we can convert the type to datetime, you can see that I used the dayfirst parameter, this is optional, but since we don’t have hours, minutes and seconds. I didn’t choose telos. I followed the documentation which you will find quite interesting.
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
Step 4:
Let’s go to the consultations, in a simple way, performing a consultation by parts, looking for one at a time. We found your values query
SearchDate = df[df['Date'] == pd.Timestamp("2022-12-04 00:00:00'")]
SearchNumber = SearchDate[SearchDate['Float'] == 5000.00]
SearchNumber
Another way to search is for the indexes that return True from each query, using the and operator, we manage to return the values of the query
searchDate = df['Date'] == pd.Timestamp("2022-12-04 00:00:00'")
seachFloat = df['Float'] == 5000.00
query = df[searchDate & seachFloat]
query
Carrying out the table in the previous step in another form of execution
query_dataframe = df[(df['Date'] == pd.Timestamp("2022-12-04 00:00:00")) & (df['Float'] == 5000.00)]
query_dataframe
step 5:
This check you put I didn’t understand very well. In a simple way, I checked if the variable is empty, I performed the return as in your example
if len(query_dataframe):
print('Yes')
else:
print('No')