How to fix Python list comprehension ValueError?

Question:

I’m learning python and need to use list comprehensions to answer a question on an assignment, but can’t figure out an error I’m getting. I have a dataframe with participants, their ages, and their scores across different tests. I tried to use list comprehension to get a list of scores from participants under a certain age,

df['scoreunder18'] = [row for row in df['score'] if df['Age'] < 18 in row]

but got the following error:

*** ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried

df['scoreunder18'] = [row for row in df['score'] if (df['Age'] < 18).item in row]

but that just returns the values from the score column without honoring the condition.

Any help would be appreciated please and thank you!

Asked By: sushicore

||

Answers:

The ValueError occurs because the entire column df['age'] is being compared to the integer 18. However, you may encounter a different error by directing the list comprehension output right back into the DataFrame as df['scoreunder18']. This is because the length of the list may not match the length of the DataFrame’s index.

In the example below the data will recreate the index-output length mismatch. I used zip() which combines each value pair in the two columns as a tuple.

import pandas as pd

d = {'participant': ['a', 'b', 'c'], 'age': [17, 21, 22], 'score': [75, 85, 95]}
df = pd.DataFrame(data=d)

list_under_18 = [sc for ag, sc in zip(df['age'], df['score']) if ag < 18]

list_under_18 = [75] which has a length of one while the DataFrame index is three. To attach this as a column to the original DataFrame convert the list to a Series, which will fill in the empty values with NaN values.

df['under_18_scores'] = pd.Series([sc for ag, sc in zip(df['age'], df['score']) if ag < 18])

Here are some similar answers for reference:

list comprehension in pandas

Adding list with different length as a new column to a dataframe

Answered By: Camden L