Using wildcard in Python

Question

I have a df with 3 columns (col1, col2, and col3), and I have a user input that can enter a value for columns, just one, two or three, and of any combination, like (col1 or col1&col2, or col2&col3, etc.). Based on the user input, I need to select rows that have these values.
For example, if I have the following table:

col1   col2   col3
1      3      3
3      4      5
5      2      1
3      4      2

and so on, the user can enter value for col2 only, (4 in this case), then I have to display rows (2 and 4), or if they enter (1) for col1 then only row(1) to display, my logical formula would be like this:

x = input1
y = input2
z = imput3
a = df['col1'] == x and df['col2'] == y and df['col1'] == z

So the x, y and z can be any value based on the input, including nil (nil means all).

Any suggestion on how to write the code for such a formula?

Asked By: Ali

||

Source

Answer 1

Use boolean combinations.

a = ((df['col1'] == x) | (x is None)) 
  & ((df['col2'] == y) | (y is None)) 
  & ((df['col3'] == z) | (z is None))
df1 = df[a]

None is the wildcard here.

Note that & and | are not short-circuiting operators, so this could be expensive. Even when you have a wildcard for a column, it will still compare everything in that column with None. A better way would be to construct the conditions dynamically.

condition = True

if x is not None:
    condition &= df['col1'] == x

if y is not None:
    condition &= df['col2'] == y

if z is not None:
    condition &= df['col3'] == z

df1 = df[condition]

You could do this more dynamically by creating a dictionary of column names and conditions.

Answered By: Barmar

Answer 2

You can have a generic solution that works regardless of how many columns your dataframe has.

You first build a dictionary of column name and lookup values, then you can index the dataframe to find the row numbers containing these values and finally you slice the dataframe to display necessary rows:

import pandas as pd

df = pd.DataFrame(
    {
        'col1': [1, 3, 5, 3],
        'col2': [3, 4, 2, 4],
        'col3': [3, 5, 1, 2]
    }
)

columns = [f'col{i}' for i in range(1, df.shape[1] + 1)]
query_dict = {}

for c in columns:
    try:
        query_dict[c] = int(input(f"Enter query for column {c} or Enter to skip: "))
    except ValueError:
        query_dict[c] = None
        
rows = []
for k, v in query_dict.items():
    rows.extend((df.loc[df[k] == v]).index.values)
    

df.iloc[rows, :]

Answered By: pavel

Answer 3

I ended up using the following code:

mask = True
if x:
    mask &= (df['col1'] == x)
if y:
    mask &= (df['col2'] == y)
if z:
    mask &= (df['col3'] == z)
result = df[mask]

Answered By: Ali

Using wildcard in Python

Question:

Answers: