Pandas DataFrame Replace every value by 1 except 0
Question:
I’m having a pandas DataFrame like following.
3,0,1,0,0
11,0,0,0,0
1,0,0,0,0
0,0,0,0,4
13,1,1,5,0
I need to replace every other value to ‘1’ except ‘0’. So my expected output.
1,0,1,0,0
1,0,0,0,0
1,0,0,0,0
0,0,0,0,1
1,1,1,1,0
Answers:
Just use something like df[df != 0]
to get at the nonzero parts of your dataframe:
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame(np.random.randint(0, 10, (5, 5)), columns=list('abcde'))
df
Out[11]:
a b c d e
0 2 2 6 1 3
1 9 6 1 0 1
2 9 0 0 9 3
3 4 0 0 4 1
4 7 3 2 4 7
df[df != 0] = 1
df
Out[13]:
a b c d e
0 1 1 1 1 1
1 1 1 1 0 1
2 1 0 0 1 1
3 1 0 0 1 1
4 1 1 1 1 1
As an unorthodox alternative, consider
%timeit (df/df == 1).astype(int)
1000 loops, best of 3: 449 µs per loop
%timeit df[df != 0] = 1
1000 loops, best of 3: 801 µs per loop
As a hint what’s happening here: df/df
gives you 1
for any value not 0
, those will be Inf
. Checking ==1
gives you the correct matrix, but in binary form – hence the transformation at the end.
However, as dataframe size increases, the advantage of not having to select but simply operate on all elements becomes irrelevant – eventually you it becomes less efficient.
Thanks Marius. Also works on just one column when you want to replace all values except 1. Just be careful, this does it inplace
create column 280 from 279 for class {1:Normal,0:Arrhythmia}
df[280] = df[279]
df[280][df[280]!=1] = 0
I’m having a pandas DataFrame like following.
3,0,1,0,0
11,0,0,0,0
1,0,0,0,0
0,0,0,0,4
13,1,1,5,0
I need to replace every other value to ‘1’ except ‘0’. So my expected output.
1,0,1,0,0
1,0,0,0,0
1,0,0,0,0
0,0,0,0,1
1,1,1,1,0
Just use something like df[df != 0]
to get at the nonzero parts of your dataframe:
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame(np.random.randint(0, 10, (5, 5)), columns=list('abcde'))
df
Out[11]:
a b c d e
0 2 2 6 1 3
1 9 6 1 0 1
2 9 0 0 9 3
3 4 0 0 4 1
4 7 3 2 4 7
df[df != 0] = 1
df
Out[13]:
a b c d e
0 1 1 1 1 1
1 1 1 1 0 1
2 1 0 0 1 1
3 1 0 0 1 1
4 1 1 1 1 1
As an unorthodox alternative, consider
%timeit (df/df == 1).astype(int)
1000 loops, best of 3: 449 µs per loop
%timeit df[df != 0] = 1
1000 loops, best of 3: 801 µs per loop
As a hint what’s happening here: df/df
gives you 1
for any value not 0
, those will be Inf
. Checking ==1
gives you the correct matrix, but in binary form – hence the transformation at the end.
However, as dataframe size increases, the advantage of not having to select but simply operate on all elements becomes irrelevant – eventually you it becomes less efficient.
Thanks Marius. Also works on just one column when you want to replace all values except 1. Just be careful, this does it inplace
create column 280 from 279 for class {1:Normal,0:Arrhythmia}
df[280] = df[279]
df[280][df[280]!=1] = 0