Nested for loop with if in one line for creating new column in dataframe
Question:
I have a dataframe "result" and want to create a new column called "type". The value in "type" will be the item value of a dict if the column "Particulars" in the dataframe contains value of the key.
dict_classify={'key1': 'content1',
'key2':'content2'
}
result['type']=[dict_classify[key] if key.lower() in i.lower() else np.nan
for key in dict_classify.keys()
for i in result['Particulars']]
It returns the error "Length of values (5200) does not match the length of index (1040)". Any idea what I did wrong?
The following is what I want to achieve in a normal for loop. Can I make it into one line?
lst_type=[]
for i in result['Particulars']:
for key in dict_classify:
temp=np.nan
if key.lower() in i.lower():
temp=dict_classify[key]
break
lst_type.append(temp)
result['type']=lst_type
Answers:
It sounds like each entry i
of result["Particulars"]
is either a key from dict_classify
in which case we want the corresponding entry of result["type"]
to be dict_classify[i]
, or is not a key in which case we want the corresponding entry to be NaN
. If that’s the case, then you should have something like
result['type'] = [dict_classify.get(i,np.nan) for i in result['Particulars']]
The same result could more efficiently be attained with
result['type'] = result['Particulars'].apply(lambda i: dict_classify.get(i,np.nan))
result[‘type’] = [dict_classify.get(i,np.nan) for i in result[‘Particulars’]]
The same result could more efficiently be attained with
result[‘type’] = result[‘Particulars’].apply(lambda i: dict_classify.get(i,np.nan))
The most straightforward way is probably to iterate through the dictionary using loc
to find cells that contain each key:
for key, value in dict_classify.items():
result.loc[result["Particulars"].str.contains(key), "type"] = value
You could also use a regex to identify the matched keys (like this answer). We can then use replace
to get the values corresponding to each key.
regex = "(" + "|".join(dict_classify) + ")"
result["type"] = result["Particulars"].str.extract(regex).replace(dict_classify)
(You could of course condense this to one line if you really want to.)
I have a dataframe "result" and want to create a new column called "type". The value in "type" will be the item value of a dict if the column "Particulars" in the dataframe contains value of the key.
dict_classify={'key1': 'content1',
'key2':'content2'
}
result['type']=[dict_classify[key] if key.lower() in i.lower() else np.nan
for key in dict_classify.keys()
for i in result['Particulars']]
It returns the error "Length of values (5200) does not match the length of index (1040)". Any idea what I did wrong?
The following is what I want to achieve in a normal for loop. Can I make it into one line?
lst_type=[]
for i in result['Particulars']:
for key in dict_classify:
temp=np.nan
if key.lower() in i.lower():
temp=dict_classify[key]
break
lst_type.append(temp)
result['type']=lst_type
It sounds like each entry i
of result["Particulars"]
is either a key from dict_classify
in which case we want the corresponding entry of result["type"]
to be dict_classify[i]
, or is not a key in which case we want the corresponding entry to be NaN
. If that’s the case, then you should have something like
result['type'] = [dict_classify.get(i,np.nan) for i in result['Particulars']]
The same result could more efficiently be attained with
result['type'] = result['Particulars'].apply(lambda i: dict_classify.get(i,np.nan))
result[‘type’] = [dict_classify.get(i,np.nan) for i in result[‘Particulars’]]
The same result could more efficiently be attained with
result[‘type’] = result[‘Particulars’].apply(lambda i: dict_classify.get(i,np.nan))
The most straightforward way is probably to iterate through the dictionary using loc
to find cells that contain each key:
for key, value in dict_classify.items():
result.loc[result["Particulars"].str.contains(key), "type"] = value
You could also use a regex to identify the matched keys (like this answer). We can then use replace
to get the values corresponding to each key.
regex = "(" + "|".join(dict_classify) + ")"
result["type"] = result["Particulars"].str.extract(regex).replace(dict_classify)
(You could of course condense this to one line if you really want to.)