How to fill missing values based on group using pandas

Question:

I am looking at order data. Each order comes in at multiple lines depending on how many different items are part of the order. The table looks like this:

+--------------+------------------+-------+
| order number | shipping address | item  |
+--------------+------------------+-------+
| A123         | Canada           | boots |
+--------------+------------------+-------+
| A123         | null             | socks |
+--------------+------------------+-------+
| A123         | null             | laces |
+--------------+------------------+-------+
| B456         | California       | shirt |
+--------------+------------------+-------+

How can I fill the null values with the actual shipping address, etc. for that order, in this case ‘Canada’? (Using python + pandas ideally)

Asked By: user14461410

||

Answers:

You need a dictionary of order number as the key and shipping address as the value, Just drop the NULLs and create a dict which you can map to the shipping address column.

di = df[['order number', 'shipping addres']]
di = di[di['shipping addres'].notnull()]
di.set_index('order number', inplace=True)
di = di.to_dict('index')
df['shipping addres'] = df['order number'].map(di)
Answered By: darth baba

This is an approach using df.groupby() follow by .ffill() and .bfill()

df['shipping address'] = df.groupby('order number')['shipping address'].ffill().bfill()
print(df)

  order number shipping address   item
0         A123           Canada  boots
1         A123           Canada  socks
2         A123           Canada  laces
3         B456       California  shirt
Answered By: Jamiu S.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.