Agg len counts null values in dataframe as one

Question:

I have a dataframe like this:

    servers.added                              Servers_added
0                                                     1
1   ['https://api.lnmarkets.com']                     1
2                                                     1
3   ['https://api.testnet.lnmarkets.com']             1
4                                                     1
5                                                     1
6   ['http://mercure.local']                          1
7                                                     1
8   ['https://virtserver.swaggerhub.com/']            1
9   ['https://www.haalcentraal.nl/']                  1
10  ['https://api.features4.com/v1']                  1
11  ['https://vwt-d-gew1-dat.com/']                   1
12  ['https://PROJECT_ID.appspot.com/']               1
13                                                    1
14  ['http://localhost:8000/api/v1', 'https://localhost:8000/api/v1']   2

I apologize since this might be a possible duplicate, I want to calculate the length of each instance, and every one of the values is always , separated. The issue is: even empty values in my dataframe are counted as 1, which is wrong.

This is my code essentially,

servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len(x.split(',')) if x.strip() else 0)

I tried using simple map and agg to calculate the length. but keep running into the same issue. I want the null values to be 0 as it affects my analysis increasing the bias towards 1. I run into the same issue with some of my other columns as well. Is there any workaround for this?

Edit:
adding the list output for better reproducibility:

{'servers.added': [nan, "['https://api.lnmarkets.com']", nan, "['https://api.testnet.lnmarkets.com']", nan, nan, "['http://mercure.local']", nan, "['https://virtserver.swaggerhub.com/VNGRealisatie/api/reisdocumenten']", "['https://www.haalcentraal.nl/haalcentraal/api/brp']"], 'Servers_added': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Asked By: Brie MerryWeather

||

Answers:

Your code look correct but you also count the value the null values as 1, just add a condition to check for null values and return 0 instead of 1

you could do it like that

servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len([s for s in x.split(',') if s.strip()]) if x.strip() else 0)
Answered By: Saxtheowl

Your values looks like valid python list, so you can convert these string to real lists. Just replace empty rows by '[]' and evaluate with ast.literal_eval (or pd.eval):

import ast

df['count'] = (df['servers.added'].replace(np.nan, '[]')
                  .apply(ast.literal_eval).str.len())
print(df)

# Output
                                        servers.added  count
0                                                           
1                       ['https://api.lnmarkets.com']      1
2                                                          0
3               ['https://api.testnet.lnmarkets.com']      1
4                                                          0
5                                                          0
6                            ['http://mercure.local']      1
7                                                          0
8              ['https://virtserver.swaggerhub.com/']      1
9                    ['https://www.haalcentraal.nl/']      1
10                   ['https://api.features4.com/v1']      1
11                    ['https://vwt-d-gew1-dat.com/']      1
12                ['https://PROJECT_ID.appspot.com/']      1
13                                                         0
14  ['http://localhost:8000/api/v1', 'https://loca...      2
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.