Can't create a column in dataframe based on other columns. Tried several options – none worked. (Python, pandas)

Question:

thanks for helping me out. I can’t create new column in a dataframe.

So far I have tried using lambdas, isin method, contains method.

I have a dataframe with these values (first two columns are dtype = object, Column c is what i want to get):

Country code| Countries                  || Column c |
KR          | KR~CN_SG~PH                || Valid    |
RO          | CN~PK                      || Invalid  |
NL          | CZ_BE~NL_IT~DE             || Valid    |
SG          | HK~SK_DZ_AL_CN_GR_RU~SA~SG || Valid    |
US          | ZA~SE~ES~CH_UA             || Invalid  |

Valid – When Country Code is in Countries

Invalid – When it isn’t

This is my first time doing code at my first Python job, sorry if this is stupid question 😀

Asked By: martinworkshere

||

Answers:

Use list comprehension with numpy.where:

m = [x in y for x, y in zip(df['Country code'], df['Countries'])]
df['Column c'] = np.where(m, 'Valid','Invalid')
Answered By: jezrael

You can use a single list comprehension:

df['Column c'] = ['Valid' if x in l else 'Invalid'
                  for x, l in zip(df['Country code'], df['Countries'])]

output:

  Country code                   Countries Column c
0           KR                 KR~CN_SG~PH    Valid
1           RO                       CN~PK  Invalid
2           NL              CZ_BE~NL_IT~DE    Valid
3           SG  HK~SK_DZ_AL_CN_GR_RU~SA~SG    Valid
4           US              ZA~SE~ES~CH_UA  Invalid
Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.