How to find column where is punctation mark as a single value in Python Pandas?
Question:
I have DataFrame like below:
COL1 | COL2 | COL3
-----|------|--------
abc | P | 123
b.bb | , | 22
1 | B | 2
... |... | ...
And I need to find columns where is only punctation mark like !"#$%&'()*+,-./:;<=>?@[]^_`{|}~
So as a result I need something like below (only COL2, because in COL1 is also punctation mark, but there is with other values).
COL2
-------
P
,
B
...
Answers:
punc = set("!"#$%&'()*+,-./:;<=>?@[]^_`{|}~")
df.loc[:, df.applymap(lambda x: set(x).issubset(punc)).any()]
Using a regex with str.fullmatch
and any
:
import re
chars = '''!"#$%&'()*+,-./:;<=>?@[]^_`{|}~'''
pattern = f'[{re.escape(chars)}]'
# [!"#$%&'()*+,-./:;<=>?@[]^_`{|}~]
out = df.loc[:, df.astype(str).apply(lambda s: s.str.fullmatch(pattern).any())]
Or with isin
:
out = df.loc[:, df.isin(set(chars)).any()]
Output:
COL2
0 P
1 ,
2 B
I have DataFrame like below:
COL1 | COL2 | COL3
-----|------|--------
abc | P | 123
b.bb | , | 22
1 | B | 2
... |... | ...
And I need to find columns where is only punctation mark like !"#$%&'()*+,-./:;<=>?@[]^_`{|}~
So as a result I need something like below (only COL2, because in COL1 is also punctation mark, but there is with other values).
COL2
-------
P
,
B
...
punc = set("!"#$%&'()*+,-./:;<=>?@[]^_`{|}~")
df.loc[:, df.applymap(lambda x: set(x).issubset(punc)).any()]
Using a regex with str.fullmatch
and any
:
import re
chars = '''!"#$%&'()*+,-./:;<=>?@[]^_`{|}~'''
pattern = f'[{re.escape(chars)}]'
# [!"#$%&'()*+,-./:;<=>?@[]^_`{|}~]
out = df.loc[:, df.astype(str).apply(lambda s: s.str.fullmatch(pattern).any())]
Or with isin
:
out = df.loc[:, df.isin(set(chars)).any()]
Output:
COL2
0 P
1 ,
2 B