Agg len counts null values in dataframe as one
Question:
I have a dataframe like this:
servers.added Servers_added
0 1
1 ['https://api.lnmarkets.com'] 1
2 1
3 ['https://api.testnet.lnmarkets.com'] 1
4 1
5 1
6 ['http://mercure.local'] 1
7 1
8 ['https://virtserver.swaggerhub.com/'] 1
9 ['https://www.haalcentraal.nl/'] 1
10 ['https://api.features4.com/v1'] 1
11 ['https://vwt-d-gew1-dat.com/'] 1
12 ['https://PROJECT_ID.appspot.com/'] 1
13 1
14 ['http://localhost:8000/api/v1', 'https://localhost:8000/api/v1'] 2
I apologize since this might be a possible duplicate, I want to calculate the length
of each instance, and every one of the values is always ,
separated. The issue is: even empty values in my dataframe are counted as 1
, which is wrong.
This is my code essentially,
servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len(x.split(',')) if x.strip() else 0)
I tried using simple map
and agg
to calculate the length. but keep running into the same issue. I want the null values to be 0 as it affects my analysis increasing the bias towards 1. I run into the same issue with some of my other columns as well. Is there any workaround for this?
Edit:
adding the list output for better reproducibility:
{'servers.added': [nan, "['https://api.lnmarkets.com']", nan, "['https://api.testnet.lnmarkets.com']", nan, nan, "['http://mercure.local']", nan, "['https://virtserver.swaggerhub.com/VNGRealisatie/api/reisdocumenten']", "['https://www.haalcentraal.nl/haalcentraal/api/brp']"], 'Servers_added': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Answers:
Your code look correct but you also count the value the null values as 1, just add a condition to check for null values and return 0 instead of 1
you could do it like that
servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len([s for s in x.split(',') if s.strip()]) if x.strip() else 0)
Your values looks like valid python list, so you can convert these string to real lists. Just replace empty rows by '[]'
and evaluate with ast.literal_eval
(or pd.eval
):
import ast
df['count'] = (df['servers.added'].replace(np.nan, '[]')
.apply(ast.literal_eval).str.len())
print(df)
# Output
servers.added count
0
1 ['https://api.lnmarkets.com'] 1
2 0
3 ['https://api.testnet.lnmarkets.com'] 1
4 0
5 0
6 ['http://mercure.local'] 1
7 0
8 ['https://virtserver.swaggerhub.com/'] 1
9 ['https://www.haalcentraal.nl/'] 1
10 ['https://api.features4.com/v1'] 1
11 ['https://vwt-d-gew1-dat.com/'] 1
12 ['https://PROJECT_ID.appspot.com/'] 1
13 0
14 ['http://localhost:8000/api/v1', 'https://loca... 2
I have a dataframe like this:
servers.added Servers_added
0 1
1 ['https://api.lnmarkets.com'] 1
2 1
3 ['https://api.testnet.lnmarkets.com'] 1
4 1
5 1
6 ['http://mercure.local'] 1
7 1
8 ['https://virtserver.swaggerhub.com/'] 1
9 ['https://www.haalcentraal.nl/'] 1
10 ['https://api.features4.com/v1'] 1
11 ['https://vwt-d-gew1-dat.com/'] 1
12 ['https://PROJECT_ID.appspot.com/'] 1
13 1
14 ['http://localhost:8000/api/v1', 'https://localhost:8000/api/v1'] 2
I apologize since this might be a possible duplicate, I want to calculate the length
of each instance, and every one of the values is always ,
separated. The issue is: even empty values in my dataframe are counted as 1
, which is wrong.
This is my code essentially,
servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len(x.split(',')) if x.strip() else 0)
I tried using simple map
and agg
to calculate the length. but keep running into the same issue. I want the null values to be 0 as it affects my analysis increasing the bias towards 1. I run into the same issue with some of my other columns as well. Is there any workaround for this?
Edit:
adding the list output for better reproducibility:
{'servers.added': [nan, "['https://api.lnmarkets.com']", nan, "['https://api.testnet.lnmarkets.com']", nan, nan, "['http://mercure.local']", nan, "['https://virtserver.swaggerhub.com/VNGRealisatie/api/reisdocumenten']", "['https://www.haalcentraal.nl/haalcentraal/api/brp']"], 'Servers_added': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Your code look correct but you also count the value the null values as 1, just add a condition to check for null values and return 0 instead of 1
you could do it like that
servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len([s for s in x.split(',') if s.strip()]) if x.strip() else 0)
Your values looks like valid python list, so you can convert these string to real lists. Just replace empty rows by '[]'
and evaluate with ast.literal_eval
(or pd.eval
):
import ast
df['count'] = (df['servers.added'].replace(np.nan, '[]')
.apply(ast.literal_eval).str.len())
print(df)
# Output
servers.added count
0
1 ['https://api.lnmarkets.com'] 1
2 0
3 ['https://api.testnet.lnmarkets.com'] 1
4 0
5 0
6 ['http://mercure.local'] 1
7 0
8 ['https://virtserver.swaggerhub.com/'] 1
9 ['https://www.haalcentraal.nl/'] 1
10 ['https://api.features4.com/v1'] 1
11 ['https://vwt-d-gew1-dat.com/'] 1
12 ['https://PROJECT_ID.appspot.com/'] 1
13 0
14 ['http://localhost:8000/api/v1', 'https://loca... 2