Python DataFrame Multi Line Filtering from Another Dataframe
Question:
I have a large data frame. Two of the columns has ['radius']
and ['angle']
.
I have another filter data frame, which only has ['radius']
and ['angle']
.
This code was meant to drop out rows that did not have both angle and radius matching from the filter dataframe. It sees every radii and angle, thus drops nothing:
df = df.drop(~df['angle'] == filter_df[angle] & ~df['radius'] == filter_df['radius'])
df = df.drop(~df['angle'].isin(filter_df[angle]) & ~df['radius'].isin(filter_df['radius']))
What the dataframes look like:
# Filter dataframe: # Main dataframe
angle radius angle radius ...
0 0 500 0 0 500 ...
1 0 1000 1 0 1000 ...
2 0 1500 2 0 1500 ...
3 45 500 3 0 2000 ...
4 45 1000 4 0 2500 ...
5 45 1500 5 0 3000 ...
6 45 2000 6 0 3500 ...
7 45 2500 7 0 4000 ...
8 45 3000 8 0 4500 ...
9 90 500 9 0 5000 ...
10 90 1000 11 45 500 ...
11 135 2000 12 45 1000 ...
12 135 2500 13 45 1500 ...
... ...
45 315 2000 719 315 7000
The main dataframe has 10 radii per angle, and 8 angles. Also, there are multiple repeats, so you end up with lots of angles and radii.
I need to be able to filter (keep) only the radii and angle pairs from the filter database. i.e. If a row (angle and radius pair) from dataframe matches with a row (angle and radius pair) from the filter dataframe, keep that row.
The filter dataframe will never have repeats, the main dataframe will, which is okay. Later the other columns not mentioned will be averaged for matching rows (angle and radius pairs).
Answers:
You can join both DataFrames and then remove the duplicate columns:
filtered = df.join(df_filter, on=["angle", "radius"],
how='inner', lsuffix='_orig')
# angle_orig radius_orig angle radius
#0 0 500 0 500
#1 0 1000 0 1000
#2 0 1500 0 1500
#3 45 500 0 2000
#4 45 1000 0 2500
#5 45 1500 0 3000
#6 45 2000 0 4000
#7 45 2500 0 5000
#8 45 3000 0 6000
#9 90 500 0 7000
#10 90 1000 45 500
#11 135 2000 45 1000
#12 135 2500 45 1500
filtered.drop(["angle_orig", "radius_orig"], inplace=True)
I’m going to add some stuff to main_df
main_df = main_df.assign(A=1, B=2, C=3)
main_df
angle radius A B C
0 0 500 1 2 3
1 0 1000 1 2 3
2 0 1500 1 2 3
3 0 2000 1 2 3
4 0 2500 1 2 3
5 0 3000 1 2 3
6 0 3500 1 2 3
7 0 4000 1 2 3
8 0 4500 1 2 3
9 0 5000 1 2 3
11 45 500 1 2 3
12 45 1000 1 2 3
13 45 1500 1 2 3
719 315 7000 1 2 3
Now because filtered_df
only has two columns and merge
automatically selects columns in common and how
is set to 'inner'
by default:
main_df.merge(filtered_df)
angle radius A B C
0 0 500 1 2 3
1 0 1000 1 2 3
2 0 1500 1 2 3
3 45 500 1 2 3
4 45 1000 1 2 3
5 45 1500 1 2 3
I have a large data frame. Two of the columns has ['radius']
and ['angle']
.
I have another filter data frame, which only has ['radius']
and ['angle']
.
This code was meant to drop out rows that did not have both angle and radius matching from the filter dataframe. It sees every radii and angle, thus drops nothing:
df = df.drop(~df['angle'] == filter_df[angle] & ~df['radius'] == filter_df['radius'])
df = df.drop(~df['angle'].isin(filter_df[angle]) & ~df['radius'].isin(filter_df['radius']))
What the dataframes look like:
# Filter dataframe: # Main dataframe
angle radius angle radius ...
0 0 500 0 0 500 ...
1 0 1000 1 0 1000 ...
2 0 1500 2 0 1500 ...
3 45 500 3 0 2000 ...
4 45 1000 4 0 2500 ...
5 45 1500 5 0 3000 ...
6 45 2000 6 0 3500 ...
7 45 2500 7 0 4000 ...
8 45 3000 8 0 4500 ...
9 90 500 9 0 5000 ...
10 90 1000 11 45 500 ...
11 135 2000 12 45 1000 ...
12 135 2500 13 45 1500 ...
... ...
45 315 2000 719 315 7000
The main dataframe has 10 radii per angle, and 8 angles. Also, there are multiple repeats, so you end up with lots of angles and radii.
I need to be able to filter (keep) only the radii and angle pairs from the filter database. i.e. If a row (angle and radius pair) from dataframe matches with a row (angle and radius pair) from the filter dataframe, keep that row.
The filter dataframe will never have repeats, the main dataframe will, which is okay. Later the other columns not mentioned will be averaged for matching rows (angle and radius pairs).
You can join both DataFrames and then remove the duplicate columns:
filtered = df.join(df_filter, on=["angle", "radius"],
how='inner', lsuffix='_orig')
# angle_orig radius_orig angle radius
#0 0 500 0 500
#1 0 1000 0 1000
#2 0 1500 0 1500
#3 45 500 0 2000
#4 45 1000 0 2500
#5 45 1500 0 3000
#6 45 2000 0 4000
#7 45 2500 0 5000
#8 45 3000 0 6000
#9 90 500 0 7000
#10 90 1000 45 500
#11 135 2000 45 1000
#12 135 2500 45 1500
filtered.drop(["angle_orig", "radius_orig"], inplace=True)
I’m going to add some stuff to main_df
main_df = main_df.assign(A=1, B=2, C=3)
main_df
angle radius A B C
0 0 500 1 2 3
1 0 1000 1 2 3
2 0 1500 1 2 3
3 0 2000 1 2 3
4 0 2500 1 2 3
5 0 3000 1 2 3
6 0 3500 1 2 3
7 0 4000 1 2 3
8 0 4500 1 2 3
9 0 5000 1 2 3
11 45 500 1 2 3
12 45 1000 1 2 3
13 45 1500 1 2 3
719 315 7000 1 2 3
Now because filtered_df
only has two columns and merge
automatically selects columns in common and how
is set to 'inner'
by default:
main_df.merge(filtered_df)
angle radius A B C
0 0 500 1 2 3
1 0 1000 1 2 3
2 0 1500 1 2 3
3 45 500 1 2 3
4 45 1000 1 2 3
5 45 1500 1 2 3