Replace column values if it repeats the same character

Question:

I have a dataframe like

df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
               'points': [11, 7, 8, 10],
               'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})

I would like to replace all cells that contain only ";" with "" (empty cells). There may be only 1 or many ";" in a row. If there is anything in the cells other than ";", I would like to leave them as is.

In my dataframe the other column would become:

 other
kk;aaa;;;
;;bb;;
             #(nothing)
             #(nothing)
Asked By: Mario

||

Answers:

here is one way to do it

# using apply, check if after replcing ; the string become of zero length
# if zero length, then return "" else the string
df['other'].apply(lambda x: "" if len(x.replace(";",''))==0 else x  )
0    kk;aaa;;;
1       ;;bb;;
2             
3             
Name: other, dtype: object
df['other2']=df['other'].apply(lambda x: "" if len(x.replace(";",''))==0 else x  )
df
    team    points  other       other2
0   Red     11      kk;aaa;;;   kk;aaa;;;
1   Blue    7       ;;bb;;      ;;bb;;
2   Yellow  8       ;   
3   Green   10      ;;;;    
Answered By: Naveed

You can use Series.replace with a regex pattern:

df['other'] = df.other.replace(r'^;{1,}$','', regex=True)

print(df)

     team  points      other
0     Red      11  kk;aaa;;;
1    Blue       7     ;;bb;;
2  Yellow       8           
3   Green      10           

Explanation pattern ^;{1,}$:

  • ^ asserts position at start of the string
  • ; matches the character ;
  • {1,} matches the previous token (i.e. ;) between one and unlimited times
  • $ asserts position at the end of the string

You can also use this on multiple columns, with df.replace. E.g.:

df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
               'points': [11, 7, 8, 10],
               'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"],
               'other2': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})

cols = ['other','other2']

df[cols] = df[cols].replace(r'^;{1,}$','', regex=True)

print(df)

     team  points      other     other2
0     Red      11  kk;aaa;;;  kk;aaa;;;
1    Blue       7     ;;bb;;     ;;bb;;
2  Yellow       8                      
3   Green      10                      
Answered By: ouroboros1

Another way is to call applymap on the dataframe and check if any string value consists entirely of ';'s and mask those values using mask(). This works for the entire dataframe. You can of course use this method after selecting particular columns (instead of on the entire dataframe).

df.mask(df.applymap(lambda x: isinstance(x, str) and set(x) == {';'}), '')

res

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.