How to remove uncommon columns in two dataframes in Pandas?
Question:
I have 2 pandas dataframes: df1 and df2
df1 has these columns:
c1, c2 ,c3 ,c4
and df2 has these columns:
c2, c3, c4, c5
How can I remove the uncommon columns in these 2 dataframes so both become like this:
df1: c2, c3, c4
df2: c2, c3, c4
Answers:
You can create a list to store only common columns and then subset columns from the two dataframes
# list with only common columns
common_columns = [col for col in df1.columns if col in df2.columns]
# keep only common columns from df1 and df2
df1 = df1[common_columns]
df2 = df2[common_columns]
Given the following dataframe:
df1 = pd.DataFrame(columns=['c1','c2','c3','c4'])
df2 = pd.DataFrame(columns=['c2','c3','c4','c5'])
Create intersection of df1
and df2
columns name:
common_col = (df2.columns) & (df1.columns)
Filter df1
and df2
by common_col
:
df1 = df1[common_col] # df1.columns: c2, c3, c4
df2 = df2[common_col] # df2.columns: c2, c3, c4
Recent version of pandas has a deprecation warning on the approach above by @Massifox
FutureWarning: Index.and operating as a set operation is deprecated, in the future this will be a logical operation matching Series.and. Use index.intersection(other) instead.
based on recommendation from deprecation warning, the following worked :
common_columns = df1.columns.intersection(df2.columns)
I have 2 pandas dataframes: df1 and df2
df1 has these columns:
c1, c2 ,c3 ,c4
and df2 has these columns:
c2, c3, c4, c5
How can I remove the uncommon columns in these 2 dataframes so both become like this:
df1: c2, c3, c4
df2: c2, c3, c4
You can create a list to store only common columns and then subset columns from the two dataframes
# list with only common columns
common_columns = [col for col in df1.columns if col in df2.columns]
# keep only common columns from df1 and df2
df1 = df1[common_columns]
df2 = df2[common_columns]
Given the following dataframe:
df1 = pd.DataFrame(columns=['c1','c2','c3','c4'])
df2 = pd.DataFrame(columns=['c2','c3','c4','c5'])
Create intersection of df1
and df2
columns name:
common_col = (df2.columns) & (df1.columns)
Filter df1
and df2
by common_col
:
df1 = df1[common_col] # df1.columns: c2, c3, c4
df2 = df2[common_col] # df2.columns: c2, c3, c4
Recent version of pandas has a deprecation warning on the approach above by @Massifox
FutureWarning: Index.and operating as a set operation is deprecated, in the future this will be a logical operation matching Series.and. Use index.intersection(other) instead.
based on recommendation from deprecation warning, the following worked :
common_columns = df1.columns.intersection(df2.columns)