Query/Filter a pandas df using a dict of lists

Question:

My problem

I have a dict d that can be of varying length consisting of the following format:

d = {
  "foo": [
    50,
    100
  ],
  "bar": [
    5,
    10
  ]
}

Where the key is a column name and the value is a two length list for the min and max value of said column to filter a datframe df on. Thus, given the input above I’d like to filter df.foo between 50-100 and df.bar between 5-10.

What I have tried

Of course, I could just hard code it like so:

df.loc[(df.list(d.items())[0][0] > list(d.items())[0][1][0]) & (df.list(d.items())[0][0] < list(d.items())[0][1][1]) ...]

etc, but the number of keys (columns to filter on) may vary and also this just incredibly ugly code. Is there a cleaner/vectorized way to do this?

Context

I am building a streamlit app where a user can create n min max filters on a dataframe, and the format listed above is the format streamlit’s slider returns

Asked By: bismo

||

Answers:

IIUC, one way using pandas.Series.between:

# sample
import numpy as np
np.random.seed(1234)

df = pd.DataFrame({"foo": np.random.random(10) * 100,
                   "bar": np.random.random(10) * 10})

         foo       bar
0  19.151945  3.578173
1  62.210877  5.009951
2  43.772774  6.834629
3  78.535858  7.127020
4  77.997581  3.702508
5  27.259261  5.611962
6  27.646426  5.030832
7  80.187218  0.137684
8  95.813935  7.728266
9  87.593263  8.826412

Code:

new_df = df[np.logical_and.reduce([df[k].between(*v) for k, v in d.items()])]
print(new_df)

Output:

         foo       bar
1  62.210877  5.009951
3  78.535858  7.127020
8  95.813935  7.728266
9  87.593263  8.826412

Validation: Works on any number of filters:

df = pd.DataFrame(np.random.random((10, 10)), columns=[*"abcdefghij"])
d = {c: [0.1, 0.9] for c in df}
new_df = df[np.logical_and.reduce([df[k].between(*v) for k, v in d.items()])]
print(new_df)
Answered By: Chris

I hope this works for your solution, I create a DataFrame for d and then i joined with another dataframe to match these values,

import pandas as pd
d = {
  "foo": [
    50,
    100
  ],
  "bar": [
    5,
    10
  ],
  "noto": [
      11,
      30
  ]
}
df_1 = pd.DataFrame(
    {
        "keys": d.keys(),
        "vals": d.values()
    }
)
df_1
df_2 = pd.DataFrame(
    {
        "item": ["foo", "bar", "noto"],
        "price": [65, 7, 33]
    }
)
main_df = df_1.merge(df_2, left_on='keys', right_on="item")
def check_price(x):
    return x['price'] >= x['vals'][0] and x['price'] <= x['vals'][1]
main_df[main_df.apply(lambda x: check_price(x), axis=1)]
Answered By: Muhammad Ali
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.