Trying to write a function that searches for a few different values in each row and outputs a match count as a new column

Question:

Suppose we have a DataFrame

df = pd.DataFrame({'A':['cam1','cam2','cam1','cam4'],'B':['cam2', 'cam1', 'cam4', 'cam3'],'C':['cam3','cam4', 'cam5','cam2']})
A B C
0 cam1 cam2 cam3
1 cam2 cam1 cam4
2 cam1 cam4 cam5
3 cam4 cam3 cam2

I’d like to add a column that counts the amount of times ‘cam1’ or ‘cam2’ appears in each row.

The desired output would look like this:

A B C Count
0 cam1 cam2 cam3 2
1 cam2 cam1 cam4 2
2 cam1 cam4 cam5 1
3 cam4 cam3 cam2 1

Is there a way to do this without using a million if else statements?

Asked By: pythonTyler

||

Answers:

You can use DataFrame.aggregate() with two lambda functions to achieve what you’re describing:

df["Count"] = df.aggregate(lambda x: len(list(filter(lambda y: y in ["cam1", "cam2"], x.values))), axis="columns")

Repl.it


Per the comment – the desire to individually "weight" specific values could be achieved by changing up the paradigm slightly to map each value to a numerical value and then summing the resulting iterable:

df["Count"] = df.aggregate(lambda x: sum(list(map(lambda y: 2 if y ==  "cam1" else (1 if y == "cam2" else 0), x.values))), axis="columns")

Repl.it

If in this scenario there are more than just a few values to map against their weights, you may consider doing this more eloquently with a dict:

weights = {
  "cam1": 2,
  "cam2": 1
}

df["Count"] = df.aggregate(lambda x: sum(list(map(lambda y: weights[y] if y in weights else 0, x.values))), axis="columns")

Repl.it

Answered By: esqew
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.