Row and Column number of empty value present in dataframe in Python
Question:
I have a data frame – the table below
row_id
name
email
age
state
1
Sta
[email protected]
29
AK
2
NaN
[email protected]
NaN
NB
3
Elle
NaN
32
CA
4
Elle
NaN
44
NaN
I want to have the row id and column name wherever my df is NaN/Empty in the below format
row_id
col_id
value
2
Name
NaN
2
age
NaN
3
name
NaN
4
name
NaN
4
state
NaN
import numpy as np
tmp = pd.DataFrame(columns= ['row_id', 'col_id','Value','Gap'], dtype=object) #Creating new DF
for name, values in proc_df.iteritems(): #For loop by items
for i,v in enumerate(values.tolist()): #For loop on values
if np.isnan(v):
tmp.loc[len(tmp)] = [proc_df.at[i,'row_id'], name, v, len([v])]
tmp
but I am getting the below error
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Answers:
Here, my basic idea is to read DF by columns and if its founds None value
create new list
and append
into new df
the list such as [2, A, NaN, 1]
.
Note : Dont know your logic for GAP column, so am just counting the None value in that specific cell.
Code:
import pandas as pd
tmp = pd.DataFrame(columns= ['row_id', 'Col_id','Value']) #Creating new DF
for name, values in df.iteritems(): #For loop by items
for i,v in enumerate(values.tolist()): #For loop on values
if pd.isna(v):
tmp.loc[len(tmp)] = [df.at[i,'row_id'], name, v]
tmp
Output:
row_id Col_id Value
0 2 name None
1 3 email None
2 4 email None
3 2 age None
4 4 sate None
you can use pd.isnull and np.argwhere
I referred to this answer to get indices of null values.
This method:
- requires less manual iteration on your part
- does not require additional dependencies (pandas is a numpy wrapper)
import pandas as pd
import numpy as np
output_d = { # save outputs here
"row_id":[],
"col_id":[],
"value":[],
}
# use .values - np.argwhere only works on numpy arrays
null_array = pd.isnull(data_df.values)
# get an iterable of [row,col] arrays
row_col_ls = np.argwhere(null_array)
# iterate through indices and save values
for row_col_tuple in row_col_ls:
# row_col_tuple[0] - the row
output_d["row_id"].append(temp_df["row_id"].iloc[row_col_tuple[0]])
# row_col_tuple[1] - the col
output_d["col_id"].append(temp_df.columns[row_col_tuple[1]])
# save the nan value from original df
output_d["value"].append(temp_df.iloc[row_col_tuple[0],row_col_tuple[1]])
output_df = pd.DataFrame(output_d)
# output_df
# row_id col_id value
# 0 2 name NaN
# 1 2 age NaN
# 2 3 email NaN
# 3 4 email NaN
# 4 4 state NaN
I have a data frame – the table below
row_id | name | age | state | |
---|---|---|---|---|
1 | Sta | [email protected] | 29 | AK |
2 | NaN | [email protected] | NaN | NB |
3 | Elle | NaN | 32 | CA |
4 | Elle | NaN | 44 | NaN |
I want to have the row id and column name wherever my df is NaN/Empty in the below format
row_id | col_id | value |
---|---|---|
2 | Name | NaN |
2 | age | NaN |
3 | name | NaN |
4 | name | NaN |
4 | state | NaN |
import numpy as np
tmp = pd.DataFrame(columns= ['row_id', 'col_id','Value','Gap'], dtype=object) #Creating new DF
for name, values in proc_df.iteritems(): #For loop by items
for i,v in enumerate(values.tolist()): #For loop on values
if np.isnan(v):
tmp.loc[len(tmp)] = [proc_df.at[i,'row_id'], name, v, len([v])]
tmp
but I am getting the below error
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Here, my basic idea is to read DF by columns and if its founds None value
create new list
and append
into new df
the list such as [2, A, NaN, 1]
.
Note : Dont know your logic for GAP column, so am just counting the None value in that specific cell.
Code:
import pandas as pd
tmp = pd.DataFrame(columns= ['row_id', 'Col_id','Value']) #Creating new DF
for name, values in df.iteritems(): #For loop by items
for i,v in enumerate(values.tolist()): #For loop on values
if pd.isna(v):
tmp.loc[len(tmp)] = [df.at[i,'row_id'], name, v]
tmp
Output:
row_id Col_id Value
0 2 name None
1 3 email None
2 4 email None
3 2 age None
4 4 sate None
you can use pd.isnull and np.argwhere
I referred to this answer to get indices of null values.
This method:
- requires less manual iteration on your part
- does not require additional dependencies (pandas is a numpy wrapper)
import pandas as pd
import numpy as np
output_d = { # save outputs here
"row_id":[],
"col_id":[],
"value":[],
}
# use .values - np.argwhere only works on numpy arrays
null_array = pd.isnull(data_df.values)
# get an iterable of [row,col] arrays
row_col_ls = np.argwhere(null_array)
# iterate through indices and save values
for row_col_tuple in row_col_ls:
# row_col_tuple[0] - the row
output_d["row_id"].append(temp_df["row_id"].iloc[row_col_tuple[0]])
# row_col_tuple[1] - the col
output_d["col_id"].append(temp_df.columns[row_col_tuple[1]])
# save the nan value from original df
output_d["value"].append(temp_df.iloc[row_col_tuple[0],row_col_tuple[1]])
output_df = pd.DataFrame(output_d)
# output_df
# row_id col_id value
# 0 2 name NaN
# 1 2 age NaN
# 2 3 email NaN
# 3 4 email NaN
# 4 4 state NaN