Pandas: Replace values in column based on a match in a different column

Question:

I’m resetting the question to make it clearer to everyone regarding the DataFrame I have issues with. The DataFrame contains over 100,000 rows with over 10 columns. The important issue I have is filling in any values in the id column that previously matches some value in type column. The values in the id column are not unique. Sometimes both id and type values are NIL.

Note: The order of things should must not be impacted as the records are dated with other columns. There are also no empty cells, all empty cells are filled with "NIL".

GOAL: To replace as many "NIL" as possible in the id column only. I do not care for any "NIL" values in the type column. But everything else must remain the same.

I am assuming this can be done without loops. Thank you for your help! Really appreciate it, strangers!

Current DataFrame

COL1 id type COL4 COL5
NIL 123 moving NIL NIL
123 lend
NIL 456 penny NIL
NIL appendix NIL
251 appendix
NIL NIL lend NIL
665 NIL NIL
NIL NIL
251 retailer NIL NIL
NIL 251 appendix NIL
NIL 456 NIL NIL

Expected DataFrame

COL1 id type COL4 COL5
NIL 123 moving NIL NIL
123 lend
NIL 456 penny NIL
251 appendix NIL
251 appendix
NIL 123 lend NIL
665 NIL NIL
NIL NIL
251 retailer NIL NIL
NIL 251 appendix NIL
NIL 456 NIL NIL
Asked By: ngnwoenqwl

||

Answers:

here is one way to do it

create a dictionary of type and id, and then using loc to fill in the missing values

d=df[df['id'].ne('NIL')].drop_duplicates(subset=['type']).to_dict()

df.loc[df['id'].eq('NIL'), 'id'] = df['type'].map(d)
df
id  type
0   26235   fruit
1   26235   veggie
2   32256   NIL
3   24145   pawn
4   12415   NIL
5   26235   fruit
6   26235   veggie
7   24145   pawn
Answered By: Naveed

You can do:

#d = df.loc[(df.id != 'NIL') & (df.type != 'NIL')].set_index('type')['id']
d = df[df!='NIL'].dropna().set_index('type')['id']

df.loc[df.id =='NIL', 'id'] = df.loc[df.id == 'NIL', 'type'].map(d)

print(df):

     id    type
0  26235   fruit
1  26235  veggie
2  32256     NIL
3  24145    pawn
4  12415     NIL
5  26235   fruit
6  26235  veggie
7  24145    pawn
Answered By: SomeDude

You can use a merge

# assemble the dataframe
In [4]: a
Out[4]: 
[26235,
 'fruit',
 26235,
 'veggie',
 32256,
 'NIL',
 'NIL',
 'pawn',
 12415,
 'NIL',
 'NIL',
 'fruit',
 'NIL',
 'veggie',
 24145,
 'pawn',
 'NIL',
 'NIL']

In [5]: df = pandas.DataFrame({'id':a[::2], 'type':a[1::2]})
 
#      id    type
#0  26235   fruit
#1  26235  veggie
#2  32256     NIL
#3    NIL    pawn
#4  12415     NIL
#5    NIL   fruit
#6    NIL  veggie
#7  24145    pawn
#8    NIL     NIL


In [6]: id_is_nill = df.id=="NIL"

In [7]: left = df.loc[id_is_nill].reset_index()

In [8]: right = df.loc[~id_is_nill].reset_index()

In [9]: mrg = pandas.merge(left, right, on='type').query("type != 'NIL'")

In [10]: mrg
Out[10]: 
   index_x id_x    type  index_y   id_y
0        3  NIL    pawn        7  24145
1        5  NIL   fruit        0  26235
2        6  NIL  veggie        1  26235

In [11]: df.loc[mrg.index_x, 'id'] = mrg.id_y.values

In [12]: df
Out[12]: 
      id    type
0  26235   fruit
1  26235  veggie
2  32256     NIL
3  24145    pawn
4  12415     NIL
5  26235   fruit
6  26235  veggie
7  24145    pawn
8    NIL     NIL
Answered By: dermen