Understanding the logic of using the any() arguement
Question:
I have a dataframe that contains only 1s and 0s and looks like this (in reality I have several more columns):
Test A B C D E
0 1 1 0 1 0
0 0 1 0 1 0
0 0 0 0 1 0
1 1 0 1 1 0
I first look at just the first column and check each row to return True if there is a one and False if there is a zero. I then look at the first two columns and check each row to return True if at least one of the values is a 1 and False if not. I continue in this manner, adding a column each time.
My code is:
for i in range(0,len(df.columns)):
print(df.iloc[:,0:0+i].any(axis=1))
For the first column, it returns False, False, False, False. I don’t understand why this is considering that my final value in the Test column is a 1. Why is it not returning False, False, False, True?
Reproducible data:
data = {'Test': [0, 0, 0, 1],
'A': [1, 0, 0, 1],
'B':[1,1,0,0],
'C':[0,0,0,1],
'D':[1,1,1,1],
'E':[0,0,0,0]}
df = pd.DataFrame(data)
Answers:
The problem is that you started your column slicing at 1 instead of 0, you are skipping the test
column.
df.iloc[:,0:1+i].any(axis=1)
(notice it starts at 0 now)
This small correction should make it works as intended.
I have a dataframe that contains only 1s and 0s and looks like this (in reality I have several more columns):
Test A B C D E
0 1 1 0 1 0
0 0 1 0 1 0
0 0 0 0 1 0
1 1 0 1 1 0
I first look at just the first column and check each row to return True if there is a one and False if there is a zero. I then look at the first two columns and check each row to return True if at least one of the values is a 1 and False if not. I continue in this manner, adding a column each time.
My code is:
for i in range(0,len(df.columns)):
print(df.iloc[:,0:0+i].any(axis=1))
For the first column, it returns False, False, False, False. I don’t understand why this is considering that my final value in the Test column is a 1. Why is it not returning False, False, False, True?
Reproducible data:
data = {'Test': [0, 0, 0, 1],
'A': [1, 0, 0, 1],
'B':[1,1,0,0],
'C':[0,0,0,1],
'D':[1,1,1,1],
'E':[0,0,0,0]}
df = pd.DataFrame(data)
The problem is that you started your column slicing at 1 instead of 0, you are skipping the test
column.
df.iloc[:,0:1+i].any(axis=1)
(notice it starts at 0 now)
This small correction should make it works as intended.