Counting Character Occurrences for Each Pandas Dataframe Record

Question:

I have a data frame with a row that looks like the following:

Section Title                          ...
==========================================
4.1.1   4.1.1 Requirements allocation. ...
4.1.2   4.1.2 Safety.                  ...
4.1.3   4.1.3 Warnings.                ...

I am trying to count the number of periods (.) in the Section column, so I wrote this line:

df['Subsections'] = df.Section.str.count(".")

However, the subsections column is returning the number 5 rather than the number I would expect for the first record which is 2 since there are two periods (.). Is there some little nuance I am missing here?

Asked By: 324

||

Answers:

By design Series.str.count(pat, flags=0) interpret pat parameter as a regular expression pattern(See the source code). So you need to explicitly escape the . character using to literally match with .

>>> df.Section.str.count(".")
Answered By: Abdul Niyas P M