Agg len counts null values in dataframe as one


I have a dataframe like this:

    servers.added                              Servers_added
0                                                     1
1   ['']                     1
2                                                     1
3   ['']             1
4                                                     1
5                                                     1
6   ['http://mercure.local']                          1
7                                                     1
8   ['']            1
9   ['']                  1
10  ['']                  1
11  ['']                   1
12  ['']               1
13                                                    1
14  ['http://localhost:8000/api/v1', 'https://localhost:8000/api/v1']   2

I apologize since this might be a possible duplicate, I want to calculate the length of each instance, and every one of the values is always , separated. The issue is: even empty values in my dataframe are counted as 1, which is wrong.

This is my code essentially,

servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len(x.split(',')) if x.strip() else 0)

I tried using simple map and agg to calculate the length. but keep running into the same issue. I want the null values to be 0 as it affects my analysis increasing the bias towards 1. I run into the same issue with some of my other columns as well. Is there any workaround for this?

adding the list output for better reproducibility:

{'servers.added': [nan, "['']", nan, "['']", nan, nan, "['http://mercure.local']", nan, "['']", "['']"], 'Servers_added': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Asked By: Brie MerryWeather



Your code look correct but you also count the value the null values as 1, just add a condition to check for null values and return 0 instead of 1

you could do it like that

servers.loc[:, 'Servers_added'] = servers['servers.added'].astype(str).apply(lambda x: len([s for s in x.split(',') if s.strip()]) if x.strip() else 0)
Answered By: Saxtheowl

Your values looks like valid python list, so you can convert these string to real lists. Just replace empty rows by '[]' and evaluate with ast.literal_eval (or pd.eval):

import ast

df['count'] = (df['servers.added'].replace(np.nan, '[]')

# Output
                                        servers.added  count
1                       ['']      1
2                                                          0
3               ['']      1
4                                                          0
5                                                          0
6                            ['http://mercure.local']      1
7                                                          0
8              ['']      1
9                    ['']      1
10                   ['']      1
11                    ['']      1
12                ['']      1
13                                                         0
14  ['http://localhost:8000/api/v1', 'https://loca...      2
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.