How to find and list a specific value from each column

Question:

I was given a dataset by my professor and one of my questions is, "Find the number of missing values, 99999, in each column and list them." How would I do this in python? I have multiple columns all with numerical data.
The missing values in the dataset are denoted by ‘99999’ instead of NA like usual.

I don’t have much experience in python and have tried many things to no avail

Asked By: Katiebk

||

Answers:

Use a lambda function to find all occurrences of 99999; then use sum() to get the total number of occurrences per column

# import pandas package
import pandas as pd

# load dataset with pandas, for example if you have a csv:
df = pd.read_csv("YOUR_FILEPATH_HERE")

# print out the number of occurrences of 99999 in each column
print(df.apply(lambda x: (x == 99999).sum()))
Answered By: Nate

A non pandas answer:

NA = 99999
data = [
  [  1, NA, 3 ],
  [ NA, NA, 6 ],
]

NAs = [0] * len(data[0])  # create an array of counters; 1 for each column

for row in data:
  for x,value in enumerate(row):
    if value == NA:
      NAs[x] += 1


print( NAs )
Answered By: Mark
# Replace the missing value code '99999' with the default missing value code NaN
df = df.replace(99999, np.nan)

# Identify the missing values in each column of the DataFrame (where NaN is the default missing value code)
missing_values = df.isnull()

Remember to import numpy as np.

Answered By: mnsosa
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.