Iterate CSV Header and create a python list of dictionary

Question:

I am trying to create a list of dictionaries iterating over CSV header.
For each column we need to check for empty cell and update the field as True/False

I have loaded the csv to a pandas dataframe and created a list of column.

Input CSV data:

id NAME location
1 Henry London
2 Joe Peru
3 NaN Germany
4 Smith NaN

Output:

  • item= Column name
  • seq= i*2
  • is_null = for any empty cell,True else
    False

enter image description here

[{item : id
is_null: False
seq : 2
verson: done}
{item : NAME
new: name
is_null: True
seq : 4
verson: done}
{item : location
is_null: True
seq : 6
verson: done}]

UPDATED the output:
1.Added extra field verson: done for all dict.
2. convert Uppercase column to lowercase adding addition field ‘new’
Need help to iterate on header and get the output.
I am still learning, Correct me in case for any mistakes 🙂

New output:
enter image description here

Asked By: Pikun95

||

Answers:

You question is unclear, but I imagine you might want:

out = (
 df.isna().any()                # check if any NaN per col
   .rename_axis('item')         # set index name
   .reset_index(name='is_null') # set flag name
   .assign(seq=lambda x: range(2, 2*len(x)+2, 2)) # assign counter * 2
   .to_dict('records')          # convert to dictionary
)

Or, using a list comprehension:

out = [{'item': col, 'is_null': df[col].isna().any(), 'seq': i*2}
       for i, col in enumerate(df, start=1)]

Or:

out = [{'item': col, 'is_null': flag, 'seq': i*2}
       for i, (col, flag) in enumerate(df.isna().any().items(), start=1)]

Output:

[{'item': 'id', 'is_null': False, 'seq': 2},
 {'item': 'name', 'is_null': True, 'seq': 4},
 {'item': 'location', 'is_null': True, 'seq': 6}]
Answered By: mozway

Firstly, I believe you should have added some kind of a code along with the question, which shows that you tried.

Now as for your query,
What you can do is –

  1. get a list of columns in dataframe by using df.columns and cast it to list.
  2. then iterate through this list and you will have values for your item key, seq using simple things, as for your is_null just do – df[col].isnull().values.any() [So if any of the values is null, this will return True else False].

Let me know if this helps, then you can edit by posting some try, and if you still don’t get it, We can see the code

EDIT: I have given a simple iterative way, But the first solution answered by mozway definitely is the best way to go.

Answered By: Umang Thusoo

You can use list comprehension for this:

[{"item": c, "is_null": forms[c].isnull().values.any(), "seq": i * 2} for i, c in enumerate(df.columns, start=1)]

It loops through each column sets name of the column for item key, checks is any null values exist in that column and sets it as is_null and finally the seq

Answered By: Ashyam

Something like the following would work.

import pandas as pd

# Dataframe
df = pd.DataFrame({'col1': [1, 2, 3, None, 5], 
                   'col2': ['a', 'b', 'c', None, 'e'], 
                   'col3': [True, False, True, True, None]})

# create a list of dictionaries with NaN value status and column index
null_list = [{'item': col, 'is_null': df[col].isnull().any(), 'seq': i*2} 
             for i, col in enumerate(df.columns)]

print(null_list)

Output:

[
 {'item': 'col1', 'is_null': True, 'seq': 0},
 {'item': 'col2', 'is_null': True, 'seq': 2},
 {'item': 'col3', 'is_null': True, 'seq': 4}
]
Answered By: fzn
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.