Trying to write a function that searches for a few different values in each row and outputs a match count as a new column
Question:
Suppose we have a DataFrame
df = pd.DataFrame({'A':['cam1','cam2','cam1','cam4'],'B':['cam2', 'cam1', 'cam4', 'cam3'],'C':['cam3','cam4', 'cam5','cam2']})
A
B
C
0
cam1
cam2
cam3
1
cam2
cam1
cam4
2
cam1
cam4
cam5
3
cam4
cam3
cam2
I’d like to add a column that counts the amount of times ‘cam1’ or ‘cam2’ appears in each row.
The desired output would look like this:
A
B
C
Count
0
cam1
cam2
cam3
2
1
cam2
cam1
cam4
2
2
cam1
cam4
cam5
1
3
cam4
cam3
cam2
1
Is there a way to do this without using a million if else statements?
Answers:
You can use DataFrame.aggregate()
with two lambda functions to achieve what you’re describing:
df["Count"] = df.aggregate(lambda x: len(list(filter(lambda y: y in ["cam1", "cam2"], x.values))), axis="columns")
Per the comment – the desire to individually "weight" specific values could be achieved by changing up the paradigm slightly to map each value to a numerical value and then sum
ming the resulting iterable:
df["Count"] = df.aggregate(lambda x: sum(list(map(lambda y: 2 if y == "cam1" else (1 if y == "cam2" else 0), x.values))), axis="columns")
If in this scenario there are more than just a few values to map against their weights, you may consider doing this more eloquently with a dict
:
weights = {
"cam1": 2,
"cam2": 1
}
df["Count"] = df.aggregate(lambda x: sum(list(map(lambda y: weights[y] if y in weights else 0, x.values))), axis="columns")
Suppose we have a DataFrame
df = pd.DataFrame({'A':['cam1','cam2','cam1','cam4'],'B':['cam2', 'cam1', 'cam4', 'cam3'],'C':['cam3','cam4', 'cam5','cam2']})
A | B | C | |
---|---|---|---|
0 | cam1 | cam2 | cam3 |
1 | cam2 | cam1 | cam4 |
2 | cam1 | cam4 | cam5 |
3 | cam4 | cam3 | cam2 |
I’d like to add a column that counts the amount of times ‘cam1’ or ‘cam2’ appears in each row.
The desired output would look like this:
A | B | C | Count | |
---|---|---|---|---|
0 | cam1 | cam2 | cam3 | 2 |
1 | cam2 | cam1 | cam4 | 2 |
2 | cam1 | cam4 | cam5 | 1 |
3 | cam4 | cam3 | cam2 | 1 |
Is there a way to do this without using a million if else statements?
You can use DataFrame.aggregate()
with two lambda functions to achieve what you’re describing:
df["Count"] = df.aggregate(lambda x: len(list(filter(lambda y: y in ["cam1", "cam2"], x.values))), axis="columns")
Per the comment – the desire to individually "weight" specific values could be achieved by changing up the paradigm slightly to map each value to a numerical value and then sum
ming the resulting iterable:
df["Count"] = df.aggregate(lambda x: sum(list(map(lambda y: 2 if y == "cam1" else (1 if y == "cam2" else 0), x.values))), axis="columns")
If in this scenario there are more than just a few values to map against their weights, you may consider doing this more eloquently with a dict
:
weights = {
"cam1": 2,
"cam2": 1
}
df["Count"] = df.aggregate(lambda x: sum(list(map(lambda y: weights[y] if y in weights else 0, x.values))), axis="columns")