Python DataFrame Multi Line Filtering from Another Dataframe

Question:

I have a large data frame. Two of the columns has ['radius'] and ['angle'].
I have another filter data frame, which only has ['radius'] and ['angle'].

This code was meant to drop out rows that did not have both angle and radius matching from the filter dataframe. It sees every radii and angle, thus drops nothing:

df = df.drop(~df['angle'] == filter_df[angle] & ~df['radius'] == filter_df['radius'])
df = df.drop(~df['angle'].isin(filter_df[angle]) & ~df['radius'].isin(filter_df['radius']))

What the dataframes look like:

 # Filter dataframe:         # Main dataframe
    angle radius                angle radius ...
 0      0    500             0      0    500 ...
 1      0   1000             1      0   1000 ...
 2      0   1500             2      0   1500 ...
 3     45    500             3      0   2000 ...
 4     45   1000             4      0   2500 ...
 5     45   1500             5      0   3000 ...
 6     45   2000             6      0   3500 ...
 7     45   2500             7      0   4000 ...
 8     45   3000             8      0   4500 ...
 9     90    500             9      0   5000 ...
10     90   1000            11     45    500 ...
11    135   2000            12     45   1000 ...
12    135   2500            13     45   1500 ...
 ...                         ...
45    315   2000           719    315   7000

The main dataframe has 10 radii per angle, and 8 angles. Also, there are multiple repeats, so you end up with lots of angles and radii.

I need to be able to filter (keep) only the radii and angle pairs from the filter database. i.e. If a row (angle and radius pair) from dataframe matches with a row (angle and radius pair) from the filter dataframe, keep that row.

The filter dataframe will never have repeats, the main dataframe will, which is okay. Later the other columns not mentioned will be averaged for matching rows (angle and radius pairs).

Asked By: John Shearer

||

Answers:

You can join both DataFrames and then remove the duplicate columns:

filtered = df.join(df_filter, on=["angle", "radius"], 
                   how='inner', lsuffix='_orig')
#    angle_orig  radius_orig  angle  radius
#0            0          500      0     500
#1            0         1000      0    1000
#2            0         1500      0    1500
#3           45          500      0    2000
#4           45         1000      0    2500
#5           45         1500      0    3000
#6           45         2000      0    4000
#7           45         2500      0    5000
#8           45         3000      0    6000
#9           90          500      0    7000
#10          90         1000     45     500
#11         135         2000     45    1000
#12         135         2500     45    1500

filtered.drop(["angle_orig", "radius_orig"], inplace=True)
Answered By: DYZ

I’m going to add some stuff to main_df

main_df = main_df.assign(A=1, B=2, C=3)
main_df

     angle  radius  A  B  C
0        0     500  1  2  3
1        0    1000  1  2  3
2        0    1500  1  2  3
3        0    2000  1  2  3
4        0    2500  1  2  3
5        0    3000  1  2  3
6        0    3500  1  2  3
7        0    4000  1  2  3
8        0    4500  1  2  3
9        0    5000  1  2  3
11      45     500  1  2  3
12      45    1000  1  2  3
13      45    1500  1  2  3
719    315    7000  1  2  3

Now because filtered_df only has two columns and merge automatically selects columns in common and how is set to 'inner' by default:

main_df.merge(filtered_df)

   angle  radius  A  B  C
0      0     500  1  2  3
1      0    1000  1  2  3
2      0    1500  1  2  3
3     45     500  1  2  3
4     45    1000  1  2  3
5     45    1500  1  2  3
Answered By: piRSquared