pandas join DataFrame force suffix?
Question:
How can I force a suffix on a merge or join. I understand it’s possible to provide one if there is a collision but in my case I’m merging df1 with df2 which doesn’t cause any collision but then merging again on df2 which uses the suffixes but I would prefer for each merge to have a suffix because it gets confusing if I do different combinations as you could imagine.
Answers:
You could force a suffix on the actual DataFrame:
In [11]: df_a = pd.DataFrame([[1], [2]], columns=['A'])
In [12]: df_b = pd.DataFrame([[3], [4]], columns=['B'])
In [13]: df_a.join(df_b)
Out[13]:
A B
0 1 3
1 2 4
By appending to its column’s names:
In [14]: df_a.columns = df_a.columns.map(lambda x: str(x) + '_a')
In [15]: df_a
Out[15]:
A_a
0 1
1 2
Now joins won’t need the suffix correction, whether they collide or not:
In [16]: df_b.columns = df_b.columns.map(lambda x: str(x) + '_b')
In [17]: df_a.join(df_b)
Out[17]:
A_a B_b
0 1 3
1 2 4
Pandas merge will give the new columns a suffix when there is already a column with the same name, When i need to force the new columns with a suffix, i create an empty column with the name of the column that i want to join.
df["colName"] = "" #create empty column
df.merge(right = "df1", suffixes = ("_a","_b"))
You can later drop the empty column.
You could do the same for more than one columns, or for every column in df.columns.values
This is what I’ve been using to pandas.merge
two DataFrames and force suffixing:
def merge_force_suffix(left, right, **kwargs):
on_col = kwargs['on']
suffix_tupple = kwargs['suffixes']
def suffix_col(col, suffix):
if col != on_col:
return str(col) + suffix
else:
return col
left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
del kwargs['suffixes']
return pd.merge(left_suffixed, right_suffixed, **kwargs)
As of pandas version 0.24.2 you can add a suffix to column names on a DataFrame using the add_suffix method.
This makes a one-liner merge command with force-suffix more bearable, for example:
df_merged = df1.merge(df2.add_suffix('_2'))
How can I force a suffix on a merge or join. I understand it’s possible to provide one if there is a collision but in my case I’m merging df1 with df2 which doesn’t cause any collision but then merging again on df2 which uses the suffixes but I would prefer for each merge to have a suffix because it gets confusing if I do different combinations as you could imagine.
You could force a suffix on the actual DataFrame:
In [11]: df_a = pd.DataFrame([[1], [2]], columns=['A'])
In [12]: df_b = pd.DataFrame([[3], [4]], columns=['B'])
In [13]: df_a.join(df_b)
Out[13]:
A B
0 1 3
1 2 4
By appending to its column’s names:
In [14]: df_a.columns = df_a.columns.map(lambda x: str(x) + '_a')
In [15]: df_a
Out[15]:
A_a
0 1
1 2
Now joins won’t need the suffix correction, whether they collide or not:
In [16]: df_b.columns = df_b.columns.map(lambda x: str(x) + '_b')
In [17]: df_a.join(df_b)
Out[17]:
A_a B_b
0 1 3
1 2 4
Pandas merge will give the new columns a suffix when there is already a column with the same name, When i need to force the new columns with a suffix, i create an empty column with the name of the column that i want to join.
df["colName"] = "" #create empty column
df.merge(right = "df1", suffixes = ("_a","_b"))
You can later drop the empty column.
You could do the same for more than one columns, or for every column in df.columns.values
This is what I’ve been using to pandas.merge
two DataFrames and force suffixing:
def merge_force_suffix(left, right, **kwargs):
on_col = kwargs['on']
suffix_tupple = kwargs['suffixes']
def suffix_col(col, suffix):
if col != on_col:
return str(col) + suffix
else:
return col
left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
del kwargs['suffixes']
return pd.merge(left_suffixed, right_suffixed, **kwargs)
As of pandas version 0.24.2 you can add a suffix to column names on a DataFrame using the add_suffix method.
This makes a one-liner merge command with force-suffix more bearable, for example:
df_merged = df1.merge(df2.add_suffix('_2'))