Filling DF's NaN/Missing data from another DF

Question:

I have two data frames:

df1 = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'],
                    'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None],
                    'Value': [5, 3, 4, 7, 1, 3, 6, 5, 9, 5, 4]})

df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B'],
                    'Group': ['x', 'y', 'z', 'y'],
                    'Repeat': [3, 2, 1, 2]}).

All the NaN of df1["Name"] have to fill by df2["Name"] by matching "Group". Can repeat matching and filling by "Repeat" times.

Desired output:

df = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'],
                   'Name': ['A', 'B', 'C', 'A', 'A', 'A', 'B', 'A', 'A', 'B', 'B'],
                   'Value': [5, 3, 4, 7, 1, 3, 6, 5, 9, 5, 4]}) 

Also looking for the fastest run time.

I did this

for index2, row2 in df2.iterrows():
    for i in range(0, row2[2]):
        for index1 in df1.index:
            if df1.iloc[index1, 1] == row2[1] and df1.iloc[ndex1, 1] == 'NaN':
                df1.iloc[ndex1, 1] = row2[0]
                break

Looking for simpler and faster solution.

Asked By: parvez alam

||

Answers:

Certainly! Based on your provided data, you can fill in the missing values in the ‘Name’ column of df1 using values from the ‘Name’ column of df2. The matching should be based on the combination of ‘Group’ columns from df2.

Here’s how you can achieve this:

import pandas as pd

# Sample data for demonstration
data1 = {'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x', 'z', 'y', 'y', 'y', 'y'],
         'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None],
         'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}

data2 = {'Name': ['A', 'A', 'B', 'B'],
         'Group': ['x', 'y', 'z', 'y'],
         'Repeat': [3, 2, 1, 2]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Merge df1 and df2 on 'Group' columns
merged_df = df1.merge(df2, on=['Group'], how='left')

# Fill missing values in 'Name' column of df1 using values from merged_df
df1['Name'] = df1['Name'].fillna(merged_df['Name_y'])

print("Updated df1:")
print(df1)

In this example, we first merge the two DataFrames df1 and df2 based on the ‘Group’ columns. This creates a temporary DataFrame merged_df with the matching values from df2 for each group and repeat combination. Then, we use the fillna() function to fill the missing values in the ‘Name’ column of df1 with the corresponding values from the ‘Name_y’ column of merged_df.

Answered By: Devam Sanghvi