treat missing data inside multiple dataframe columns

Question:

problem is: add missed " ] " brackets at end of data for all columns except columns of ID and z, if the backtick does not excite at end of data for each column as shown below

P.S> dataframe contains multiple columns from x,y,a,b,c,d ……… etc. until z and the solution should deal with multiple columns

dftest = pd.DataFrame({'ID':['EF407412','KM043272']
                   , 'x': ['[2788, 3140, 4836','[539, 906, 1494, 1932, 2029,7001']
                   , 'y': ['[1408, 1572, 2277','[1,10000]']
                   , 'z': ['[1408, 1572, 2277]','[1,10000]']
                   # df dataframe containes N colemans x,y,z,a,b,c ......etc more than 100 colemans 
                   })

enter image description here

Asked By: Mohamed Elhefnawy

||

Answers:

Assuming it is only ID and z columns that you want to skip, try this:

dfcut = dftest[[col for col in dftest if col != 'ID' and col != 'z']]
dfcut = dfcut.applymap(lambda cell: cell + ']' if cell[-1] != ']' else cell)

You can then insert newly edited columns to dftest as follows:

dftest[[col for col in dfcut.columns]] = dfcut
Answered By: Nuri Taş

With apply and str.replace which IMO is much better performing:

You basically look for the pattern that begins with ‘[‘ and has digits and spaces and commas but doesn’t end with ‘]’ with a negative look ahead and replace it with the captured value + ‘]’.

dftest.apply(lambda x: x.str.replace(
    pat=r'^([[ds,]+(?!])$)', repl=lambda m: m.group(1) + ']', regex=True))

With applymap (not so performant):

dftest.astype(str).applymap(
    lambda x: x + ']' if ~(x.endswith(']')) & (x.startswith('[')) else x)

output:

          ID                     x                                   y                   z
0   EF407412                   [2788, 3140, 4836]   [1408, 1572, 2277]  [1408, 1572, 2277]
1   KM043272    [539, 906, 1494, 1932, 2029,7001]            [1,10000]           [1,10000]
Answered By: SomeDude
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.