Conditionally Set Values Greater Than 0 To 1

Question:

I have a dataframe that looks like this, with many more date columns

              AUTHOR        2022-07-01  2022-10-14      2022-10-15 .....
0            Kathrine          0.0         7.0              0.0
1            Catherine         0.0         13.0             17.0
2            Amanda Jane       0.0         0.0              0.0
3            Jaqueline         0.0         3.0              0.0
4            Christine         0.0         0.0              0.0

I would like to set values in each column after the AUTHOR to 1 when the value is greater than 0, so the resulting table would look like this:

              AUTHOR        2022-07-01  2022-10-14      2022-10-15 .....
0            Kathrine          0.0         1.0              0.0
1            Catherine         0.0         1.0              1.0
2            Amanda Jane       0.0         0.0              0.0
3            Jaqueline         0.0         1.0              0.0
4            Christine         0.0         0.0              0.0

I tried the following line of code but got an error, which makes sense. As I need to figure out how to apply this code just to the date columns while also keeping the AUTHOR column in my table.

Counts[Counts != 0] = 1


TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
Asked By: Raven

||

Answers:

You can select the date column first then mask on these columns

cols = df.drop(columns='AUTHOR').columns
# or
cols = df.filter(regex='d{4}-d{2}-d{2}').columns
# or
cols = df.select_dtypes(include='number').columns

df[cols] = df[cols].mask(df[cols] != 0, 1)
print(df)

        AUTHOR  2022-07-01  2022-10-14  2022-10-15
0     Kathrine         0.0         1.0         0.0
1    Catherine         0.0         1.0         1.0
2  Amanda Jane         0.0         0.0         0.0
3    Jaqueline         0.0         1.0         0.0
4    Christine         0.0         0.0         0.0
Answered By: Ynjxsjmh

Since you would like to only exclude the first column you could first set it as index and then create your booleans. In the end you will reset the index.

df.set_index('AUTHOR').pipe(lambda g: g.mask(g > 0, 1)).reset_index()
df

     AUTHOR  2022-10-14  2022-10-15
0  Kathrine         0.0         1.0
1  Cathrine         1.0         1.0
Answered By: Anoushiravan R
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.