Counting the number of pandas.DataFrame rows for each column
Question:
What I want to do
I would like to count the number of rows with conditions. Each column should have different numbers.
import numpy as np
import pandas as pd
## Sample DataFrame
data = [[1, 2], [0, 3], [np.nan, np.nan], [1, -1]]
index = ['i1', 'i2', 'i3', 'i4']
columns = ['c1', 'c2']
df = pd.DataFrame(data, index=index, columns=columns)
print(df)
## Output
# c1 c2
# i1 1.0 2.0
# i2 0.0 3.0
# i3 NaN NaN
# i4 1.0 -1.0
## Question 1: Count non-NaN values
## Expected result
# [3, 3]
## Question 2: Count non-zero numerical values
## Expected result
# [2, 3]
Note: Data types of results are not important. They can be list, pandas.Series, pandas.DataFrame etc. (I can convert data types anyway.)
What I have checked
## For Question 1
print(df[df['c1'].apply(lambda x: not pd.isna(x))].count())
## For Question 2
print(df[df['c1'] != 0].count())
Obviously these two print
functions are only for column c1
. It’s easy to check one column by one column. I would like to know if there is a way to calculate counts of all columns at once.
Environment
Python 3.10.5
pandas 1.4.3
Answers:
You was close I think ! To answer your first question :
>>> df.apply(lambda x : x.isna().sum(), axis = 0)
c1 1
c2 1
dtype: int64
You change to axis = 1 to apply this operation on each row.
To answer your second question this is from here (already answered question on SO) :
>>> df.astype(bool).sum(axis=0)
c1 3
c2 4
dtype: int64
In the same way you can change axis to 1 if you want …
Hope it helps !
You do not iterate over your data using apply
. You can achieve your results in a vectorized fashion:
print(df.notna().sum().to_list()) # [3, 3]
print((df.ne(0) & df.notna()).sum().to_list()) # [2, 3]
Note that I have assumed that "Question 2: Count non-zero values" also excluded nan
values, otherwise you would get [3, 4]
.
What I want to do
I would like to count the number of rows with conditions. Each column should have different numbers.
import numpy as np
import pandas as pd
## Sample DataFrame
data = [[1, 2], [0, 3], [np.nan, np.nan], [1, -1]]
index = ['i1', 'i2', 'i3', 'i4']
columns = ['c1', 'c2']
df = pd.DataFrame(data, index=index, columns=columns)
print(df)
## Output
# c1 c2
# i1 1.0 2.0
# i2 0.0 3.0
# i3 NaN NaN
# i4 1.0 -1.0
## Question 1: Count non-NaN values
## Expected result
# [3, 3]
## Question 2: Count non-zero numerical values
## Expected result
# [2, 3]
Note: Data types of results are not important. They can be list, pandas.Series, pandas.DataFrame etc. (I can convert data types anyway.)
What I have checked
## For Question 1
print(df[df['c1'].apply(lambda x: not pd.isna(x))].count())
## For Question 2
print(df[df['c1'] != 0].count())
Obviously these two print
functions are only for column c1
. It’s easy to check one column by one column. I would like to know if there is a way to calculate counts of all columns at once.
Environment
Python 3.10.5
pandas 1.4.3
You was close I think ! To answer your first question :
>>> df.apply(lambda x : x.isna().sum(), axis = 0)
c1 1
c2 1
dtype: int64
You change to axis = 1 to apply this operation on each row.
To answer your second question this is from here (already answered question on SO) :
>>> df.astype(bool).sum(axis=0)
c1 3
c2 4
dtype: int64
In the same way you can change axis to 1 if you want …
Hope it helps !
You do not iterate over your data using apply
. You can achieve your results in a vectorized fashion:
print(df.notna().sum().to_list()) # [3, 3]
print((df.ne(0) & df.notna()).sum().to_list()) # [2, 3]
Note that I have assumed that "Question 2: Count non-zero values" also excluded nan
values, otherwise you would get [3, 4]
.