How to fill missing values based on group using pandas
Question:
I am looking at order data. Each order comes in at multiple lines depending on how many different items are part of the order. The table looks like this:
+--------------+------------------+-------+
| order number | shipping address | item |
+--------------+------------------+-------+
| A123 | Canada | boots |
+--------------+------------------+-------+
| A123 | null | socks |
+--------------+------------------+-------+
| A123 | null | laces |
+--------------+------------------+-------+
| B456 | California | shirt |
+--------------+------------------+-------+
How can I fill the null values with the actual shipping address, etc. for that order, in this case ‘Canada’? (Using python + pandas ideally)
Answers:
You need a dictionary of order number as the key and shipping address as the value, Just drop the NULLs and create a dict which you can map to the shipping address column.
di = df[['order number', 'shipping addres']]
di = di[di['shipping addres'].notnull()]
di.set_index('order number', inplace=True)
di = di.to_dict('index')
df['shipping addres'] = df['order number'].map(di)
This is an approach using df.groupby()
follow by .ffill()
and .bfill()
df['shipping address'] = df.groupby('order number')['shipping address'].ffill().bfill()
print(df)
order number shipping address item
0 A123 Canada boots
1 A123 Canada socks
2 A123 Canada laces
3 B456 California shirt
I am looking at order data. Each order comes in at multiple lines depending on how many different items are part of the order. The table looks like this:
+--------------+------------------+-------+
| order number | shipping address | item |
+--------------+------------------+-------+
| A123 | Canada | boots |
+--------------+------------------+-------+
| A123 | null | socks |
+--------------+------------------+-------+
| A123 | null | laces |
+--------------+------------------+-------+
| B456 | California | shirt |
+--------------+------------------+-------+
How can I fill the null values with the actual shipping address, etc. for that order, in this case ‘Canada’? (Using python + pandas ideally)
You need a dictionary of order number as the key and shipping address as the value, Just drop the NULLs and create a dict which you can map to the shipping address column.
di = df[['order number', 'shipping addres']]
di = di[di['shipping addres'].notnull()]
di.set_index('order number', inplace=True)
di = di.to_dict('index')
df['shipping addres'] = df['order number'].map(di)
This is an approach using df.groupby()
follow by .ffill()
and .bfill()
df['shipping address'] = df.groupby('order number')['shipping address'].ffill().bfill()
print(df)
order number shipping address item
0 A123 Canada boots
1 A123 Canada socks
2 A123 Canada laces
3 B456 California shirt