Checking if a pandas column value is present in another pandas column (list)

Question:

I have a pandas column with a string value and I want to see if a separate column (listed format) contains the string at all.

Category top predicted
Category A. Molecular Pathogenesis and Physiology list see below
[("Category A. Molecular Pathogenesis and Physiology::HiClass::Separator::1. Amyloid beta::HiClass::Separator::f. Amyloid Structure",
  0.054),
 ('Category B. Diagnosis and Assessment::HiClass::Separator::8. Methodologies::HiClass::Separator::None',
  0.049),
 ('Category B. Diagnosis and Assessment::HiClass::Separator::1. Fluid Biomarkers::HiClass::Separator::b. Blood-based',
  0.035)]

The list generated provides Category and 2 further sub-categories.

What I desire is a way to determine and identify how many times the Category column value appears in the list for column top predicted. In the above case "Category A. Molecular Pathogenesis and Physiology" for example would return a 1. If the value was "Category B. Diagnosis and Assessment" then 2 would be returned.
This would then iterate through the rows in the pandas dataframe.

Any help in achieving this would be much appreciated 🙂 Many thanks!

Asked By: dinho

||

Answers:

Your second column contains a list of tuples, which in turn contain the strings to check for. The following lines of code should do it:

df['count'] = df.apply(lambda row: sum(1 for x in row['top predicted'] if row['Category'] in x[0]), axis=1)

You should use apply() instead of iterating over the rows as you suggested.

Answered By: Jan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.