Imputing nulls in a row with other row if one column is same
Question:
I have a dataframe
data = [[1000, 'x', 'A'], [2000,'y', 'A'], ['NaN','NaN', 'A'], ['NaN','NaN','B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
df = df.replace('NaN',np.nan)
Now i want to impute the nulls in such a way that if Model is same, copy the content of rows having least Price to the row having nulls.
The output should look like
data = [[1000, 'x', 'A'], [2000,'y', 'A'], [1000, 'x', 'A'], [1700,'z','B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
I have tried groupby and followed Merge two duplicate rows with imputing values from each other
but it did not work. can someone help
Answers:
If there is multiple columns use DataFrame.fillna
with minimal values per groups to new columns by GroupBy.transform
:
cols = ['Price','Col1']
df[cols] = df[cols].fillna(df.groupby('Model')[cols].transform('min'))
print(df)
Price Attribute Model
0 1000.0 x A
1 2000.0 y A
2 1000.0 NaN A
3 1700.0 NaN B
4 1700.0 z B
EDIT: If need replace all missing values by rows with no NaN
s use:
data = [[1000, 'x', 'A'], [2000,'y', 'A'], [np.nan,np.nan, 'A'],
[np.nan,np.nan,'B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
df1 = df.loc[df.dropna().groupby('Model')['Price'].idxmin()]
print (df1)
Price Attribute Model
0 1000.0 x A
4 1700.0 z B
df = df.set_index('Model').fillna(df1.set_index('Model')).reset_index()
print (df)
Model Price Attribute
0 A 1000.0 x
1 A 2000.0 y
2 A 1000.0 x
3 B 1700.0 z
4 B 1700.0 z
I have a dataframe
data = [[1000, 'x', 'A'], [2000,'y', 'A'], ['NaN','NaN', 'A'], ['NaN','NaN','B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
df = df.replace('NaN',np.nan)
Now i want to impute the nulls in such a way that if Model is same, copy the content of rows having least Price to the row having nulls.
The output should look like
data = [[1000, 'x', 'A'], [2000,'y', 'A'], [1000, 'x', 'A'], [1700,'z','B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
I have tried groupby and followed Merge two duplicate rows with imputing values from each other
but it did not work. can someone help
If there is multiple columns use DataFrame.fillna
with minimal values per groups to new columns by GroupBy.transform
:
cols = ['Price','Col1']
df[cols] = df[cols].fillna(df.groupby('Model')[cols].transform('min'))
print(df)
Price Attribute Model
0 1000.0 x A
1 2000.0 y A
2 1000.0 NaN A
3 1700.0 NaN B
4 1700.0 z B
EDIT: If need replace all missing values by rows with no NaN
s use:
data = [[1000, 'x', 'A'], [2000,'y', 'A'], [np.nan,np.nan, 'A'],
[np.nan,np.nan,'B'], [1700,'z', 'B']]
df = pd.DataFrame(data, columns=['Price', 'Attribute', 'Model' ])
df1 = df.loc[df.dropna().groupby('Model')['Price'].idxmin()]
print (df1)
Price Attribute Model
0 1000.0 x A
4 1700.0 z B
df = df.set_index('Model').fillna(df1.set_index('Model')).reset_index()
print (df)
Model Price Attribute
0 A 1000.0 x
1 A 2000.0 y
2 A 1000.0 x
3 B 1700.0 z
4 B 1700.0 z