Count the number of strings with length in pandas

Question:

I am trying to calculate the number of strings in a column with length of 5 or more. These strings are in a column separated by comma.


df= pd.DataFrame(columns=['first'])
df['first'] = ['Jack Ryan, Tom O','Stack Over Flow, StackOverFlow','Jurassic Park, IT', 'GOT']

Code I have used till now but not creating a new column with counts of strings of more than 5 characters.

df['countStrings'] = df['first'].str.split(',').count(r'[a-zA-Z0-9]{5,}')

Expected Output: Counting Strings of length 5 or More.

first countString
Jack Ryan, Tom O 0
Stack Over Flow, StackOverFlow 2
Jurassic Park, IT 1
GOT 0

Edge Case: Strings of length more than 5 separated by comma and have multiple spaces

first wrongCounts rightCounts
Accounts Payable Goods for Resale 4 1
Corporate Finance, Financial Engineering 4 2
TBD 0 0
Goods for Not Resale, SAP 2 1
Asked By: LonelySoul

||

Answers:

Pandas str.len() method is used to determine length of each string in a Pandas series. This method is only for series of strings.
Since this is a string method, .str has to be prefixed everytime before calling this method.

Yo can try this :

import pandas as pd

df = pd.DataFrame(columns=['first'])
df['first'] = ['jack,utah,TOMHAWK 
Somer,SORITNO','jill','bob,texas','matt,AR','john']

df['first'].replace(',',' ', regex=True, inplace=True)
df['first'].str.count(r'w+').sum()

This is how i would try to get the number of strings with len>=5 in a column:

data=[i for k in df['first']
        for i in k.split(',')
        if len(i)>=5]
result=len(data)

You can match 5 chars and on the left and right match optional chars other than a comma.

[^,]*[A-Za-z0-9]{5}[^,]*

See a regex demo with the matches.

Example

import pandas as pd

df = pd.DataFrame(columns=['first'])
df['first'] = [
    'Accounts Payable Goods for Resale',
    'Corporate Finance, Financial Engineering',
    'TBD',
    'Goods for Not Resale, SAP',
    'Jack Ryan, Tom O',
    'Stack Over Flow, StackOverFlow',
    'Jurassic Park, IT',
    'GOT'
]
df['countStrings'] = df['first'].str.count(r'[^,]*[A-Za-z0-9]{5}[^,]*')
print(df)

Output

                                      first  countStrings
0         Accounts Payable Goods for Resale             1
1  Corporate Finance, Financial Engineering             2
2                                       TBD             0
3                 Goods for Not Resale, SAP             1
4                          Jack Ryan, Tom O             0
5            Stack Over Flow, StackOverFlow             2
6                         Jurassic Park, IT             1
7                                       GOT             0
Answered By: The fourth bird
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.