How to merge dataframe and series?

Question

I have a dataframe and a series which are as following.

_df = pd.DataFrame({'a': [1, '', '', 4], 'b': ['apple', 'banana', 'orange', 'pear']}, columns=['a', 'b'])
_series = pd.Series(['', 2, 3, ''], name="a")

Here I would like to merge the dataframe and series along column a to get rid of all the blanks. This is the result I want.

   a    b
0  1  apple
1  2  banana
2  3  orange
3  4  pear

Here is how I do it.

for i in range(len(_df.iloc[:, 0].to_list())):
    if _df.iloc[i, 0] == '':
       _df.iloc[i, 0] = df_series[i]

Problem is it can be very slow if the dataframe is big. Anyone knows if I can do this in a more efficient way?

Asked By: haojie

||

Source

Answer 1

Use Series.combine with lambda function for test not empty strings:

df = _df.assign(a=_df["a"].combine(_series, lambda x, y: x if x != '' else y))
print (df)
   a       b
0  1   apple
1  2  banana
2  3  orange
3  4    pear

Or Series.combine_first with replace empty strings to missing values:

df = _df.assign(a=_df["a"].replace('', np.nan).combine_first(_series.replace('', np.nan)))
print (df)
     a       b
0  1.0   apple
1  2.0  banana
2  3.0  orange
3  4.0    pear

For new column name c from variable a use:

a = 'c'
df = _df.assign(**{f'{a}': _df["a"].combine(_series, lambda x, y: x if x != '' else y)})
print (df)

Or:

a = 'c'

_df[f'{a}'] =  _df["a"].combine(_series, lambda x, y: x if x != '' else y)
print (_df)
   a       b  c
0  1   apple  1
1     banana  2
2     orange  3
3  4    pear  4

Answered By: jezrael

Answer 2

You can use pandas merge:

df = _df.merge(_series, left_index=True, right_index=True)
df['a_x'] = df['a_x'].astype(str) + df['a_y'].astype(str)
df.rename(columns={'a_x': 'a'}).drop(columns=['a_y'])

Output:

Answered By: Hamid Rasti

How to merge dataframe and series?

Question:

Answers: