Count space character before a letter/number in a python pandas column
Question:
I have a pandas dataframe, which looks like this:
columnA columnB
A 10
B 12
C 13
D 14
010 17
How can i count the space characters before the first string/number/letter in the column A in a new column? So for example:
columnA columnB counter
A 10 0
B 12 1
C 13 2
D 14 2
010 17 1
Answers:
You can combine str.extract
and str.len
:
df['counter'] = df['columnA'].str.extract('^( *)', expand=False).str.len()
Output (I added quotes around the string for visualization):
columnA columnB counter
0 "A" 10 0
1 " B" 12 1
2 " C" 13 2
3 " D" 14 2
4 " 010" 17 1
Reproducible input:
df = pd.DataFrame({'columnA': ['A', ' B', ' C', ' D', ' 010'],
'columnB': [10, 12, 13, 14, 17],
'counter': [0, 1, 2, 2, 1]})
You can use str.findall
then extract the len of the first item:
df['counter'] = df['columnA'].str.findall('^ *').str[0].str.len()
print(df)
# Output
columnA columnB counter
0 A 10 0
1 B 12 1
2 C 13 2
3 D 14 2
4 010 17 1
You can use RegExp with apply()
, why not:
import pandas as pd
import re
df = pd.DataFrame({'columnA': ['A', ' B', ' C', ' D', ' 010'], 'columnB': [10, 12, 13, 14, 17]})
pattern = r'^s*'
def count_spaces(s):
return len(re.match(pattern, s).group())
df['counter'] = df['columnA'].apply(count_spaces)
print(df)
Output:
columnA columnB counter
0 A 10 0
1 B 12 1
2 C 13 2
3 D 14 2
4 010 17 1
I have a pandas dataframe, which looks like this:
columnA columnB
A 10
B 12
C 13
D 14
010 17
How can i count the space characters before the first string/number/letter in the column A in a new column? So for example:
columnA columnB counter
A 10 0
B 12 1
C 13 2
D 14 2
010 17 1
You can combine str.extract
and str.len
:
df['counter'] = df['columnA'].str.extract('^( *)', expand=False).str.len()
Output (I added quotes around the string for visualization):
columnA columnB counter
0 "A" 10 0
1 " B" 12 1
2 " C" 13 2
3 " D" 14 2
4 " 010" 17 1
Reproducible input:
df = pd.DataFrame({'columnA': ['A', ' B', ' C', ' D', ' 010'],
'columnB': [10, 12, 13, 14, 17],
'counter': [0, 1, 2, 2, 1]})
You can use str.findall
then extract the len of the first item:
df['counter'] = df['columnA'].str.findall('^ *').str[0].str.len()
print(df)
# Output
columnA columnB counter
0 A 10 0
1 B 12 1
2 C 13 2
3 D 14 2
4 010 17 1
You can use RegExp with apply()
, why not:
import pandas as pd
import re
df = pd.DataFrame({'columnA': ['A', ' B', ' C', ' D', ' 010'], 'columnB': [10, 12, 13, 14, 17]})
pattern = r'^s*'
def count_spaces(s):
return len(re.match(pattern, s).group())
df['counter'] = df['columnA'].apply(count_spaces)
print(df)
Output:
columnA columnB counter
0 A 10 0
1 B 12 1
2 C 13 2
3 D 14 2
4 010 17 1