Counting the number of pandas.DataFrame rows for each column

Question:

What I want to do

I would like to count the number of rows with conditions. Each column should have different numbers.

import numpy as np
import pandas as pd

## Sample DataFrame
data = [[1, 2], [0, 3], [np.nan, np.nan], [1, -1]]
index = ['i1', 'i2', 'i3', 'i4']
columns = ['c1', 'c2']
df = pd.DataFrame(data, index=index, columns=columns)
print(df)

## Output
#      c1   c2
# i1  1.0  2.0
# i2  0.0  3.0
# i3  NaN  NaN
# i4  1.0 -1.0

## Question 1: Count non-NaN values
## Expected result
# [3, 3]

## Question 2: Count non-zero numerical values
## Expected result
# [2, 3]

Note: Data types of results are not important. They can be list, pandas.Series, pandas.DataFrame etc. (I can convert data types anyway.)

What I have checked

## For Question 1
print(df[df['c1'].apply(lambda x: not pd.isna(x))].count())

## For Question 2
print(df[df['c1'] != 0].count())

Obviously these two print functions are only for column c1. It’s easy to check one column by one column. I would like to know if there is a way to calculate counts of all columns at once.

Environment

Python 3.10.5
pandas 1.4.3

Asked By: dmjy

||

Answers:

You was close I think ! To answer your first question :

>>> df.apply(lambda x : x.isna().sum(), axis = 0)
c1    1
c2    1
dtype: int64

You change to axis = 1 to apply this operation on each row.

To answer your second question this is from here (already answered question on SO) :


>>> df.astype(bool).sum(axis=0)
c1    3
c2    4
dtype: int64

In the same way you can change axis to 1 if you want …

Hope it helps !

Answered By: bvittrant

You do not iterate over your data using apply. You can achieve your results in a vectorized fashion:

print(df.notna().sum().to_list()) # [3, 3]
print((df.ne(0) & df.notna()).sum().to_list()) # [2, 3]

Note that I have assumed that "Question 2: Count non-zero values" also excluded nan values, otherwise you would get [3, 4].

Answered By: ko3
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.