Extract Frozenset items from Pandas Dataframe

Question:

I have the following dataframe:

enter image description here

And I would like to convert the columns “antecedents” and “consequents” to string, removing the “frozenset({ … })” format and thus have, for all the rows:

“VENTOLIN S.INAL200D 100MCG”, instead of frozenset({ “VENTOLIN S.INAL200D 100MCG” }).

I managed to achieve the result with:

prod = []

for i in df["antecedents"]:
    prod.append(str(i))

new_set = {x.replace('frozenset', ''
                     ).replace('})', ''
                        ).replace('({', ''
                        ).replace("'", "") for x in prod}

Is there a more pythonic solution?

Answers:

First convert values to tuples or lists and then use DataFrame.explode:

df = pd.DataFrame({
         'antecedents':[frozenset({'aaa', 'bbb'})] * 3 + [frozenset({'nbb'})] * 3,
         'consequents':[frozenset({'ccc'})] * 3 + [frozenset({'nbb', 'ddd'})] * 3,
         'C':[1,3,5,7,1,0],
})
#print (df)

cols = ['antecedents','consequents']
df[cols] = df[cols].applymap(lambda x: tuple(x))
print (df)
  antecedents consequents  C
0  (bbb, aaa)      (ccc,)  1
1  (bbb, aaa)      (ccc,)  3
2  (bbb, aaa)      (ccc,)  5
3      (nbb,)  (nbb, ddd)  7
4      (nbb,)  (nbb, ddd)  1
5      (nbb,)  (nbb, ddd)  0

df1 = (df.explode('antecedents')
         .reset_index(drop=True)
         .explode('consequents')
         .reset_index(drop=True))
print (df1)
   antecedents consequents  C
0          bbb         ccc  1
1          aaa         ccc  1
2          bbb         ccc  3
3          aaa         ccc  3
4          bbb         ccc  5
5          aaa         ccc  5
6          nbb         nbb  7
7          nbb         ddd  7
8          nbb         nbb  1
9          nbb         ddd  1
10         nbb         nbb  0
11         nbb         ddd  0
Answered By: jezrael

First convert both columns ‘antecedents’ and ‘consequents’ to string as:

df['antecedents'] = df['antecedents'].astype('string')
df['consequents'] = df['consequents'].astype('string')

then remove the prefix "frozenset({" and the suffix "})" from both columns ‘antecedents’ and ‘consequents’:

df['antecedents'] = df['antecedents'].str.removeprefix("frozenset({")
df['antecedents'] = df['antecedents'].str.removesuffix("})")

df['consequents'] = df['consequents'].str.removeprefix("frozenset({")
df['consequents'] = df['consequents'].str.removesuffix("})")

Answered By: abdelilah mbarek
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.