Removing punctuations in dataframe using for loop

Question:

I have a dataframe that looks like below

  A        B       C       D      E
0 Orange  Dad's  X Eyes   3d.    Navy
1 pink.   Mum's  Bored.   ooo.   NaN
2 Yellow  NaN    Sad      Gray   NaN

I’m trying to remove punctuations in every column in the dataframe using for loop

import string
string.punctuation

#defining the function to remove punctuation
def remove_punctuation(text):
    punctuationfree="".join([i for i in text if i not in string.punctuation])
    return punctuationfree

#storing the puntuation free text
col=['A','B','C','D','E']

for i in col:
    df[i].apply(lambda x:remove_punctuation(x))

But I get

    "TypeError                                 Traceback (most recent call last)
    /var/folders/jd/lln92nb4p01g8grr0000gn/T/ipykernel_24651/2417883.py in <module>
         12 
         13 for i in col:
    ---> 14     df[i].apply(lambda x:remove_punctuation(x))
      
TypeError: 'float' object is not iterable" 

Can anyone help me on this please? Any help would be greatly appreciated!

Asked By: mimiskims

||

Answers:

I think you might have some float values in your dataframe.

So maybe try to remove them, or in the remove_punctuation function::

def remove_punctuation(text):
    punctuationfree= "".join([i for i in text if i not in string.punctuation]) if isinstance(text, str) else text
    return punctuationfree

Which tests if text is a string otherwise return it as is

Answered By: PlainRavioli

You are getting the error because of NaN values, try to check for NaN upfront:

def remove_punctuation(text):
    if pd.isna(text):
        return text
    punctuationfree="".join([i for i in text if i not in string.punctuation])
    return punctuationfree

for c in df:
    df[c] = df[c].apply(remove_punctuation)

OUTPUT

# df
          A     B       C     D     E
0   Orange   Dads  X Eyes    3d  Navy
1     pink   Mums   Bored   ooo   NaN
2   Yellow   NaN     Sad  Gray   NaN
Answered By: ThePyGuy
df = pd.DataFrame({'A': ['Orange' , "pink.",  "Yellow"],"B":["3d.", "Boared","%hgh&12"]})

for column in df:
     df[column]=df[column].str.replace(r'[^ws]+', '')

your output will look like this:

       A        B
0     Orange    3d
1     pink   Boared
2    Yellow  hgh12
Answered By: Niraj Gautam
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.