Iterate CSV Header and create a python list of dictionary
Question:
I am trying to create a list of dictionaries iterating over CSV header.
For each column we need to check for empty cell and update the field as True/False
I have loaded the csv to a pandas dataframe and created a list of column.
Input CSV data:
id
NAME
location
1
Henry
London
2
Joe
Peru
3
NaN
Germany
4
Smith
NaN
Output:
- item= Column name
- seq= i*2
- is_null = for any empty cell,True else
False
[{item : id
is_null: False
seq : 2
verson: done}
{item : NAME
new: name
is_null: True
seq : 4
verson: done}
{item : location
is_null: True
seq : 6
verson: done}]
UPDATED the output:
1.Added extra field verson: done for all dict.
2. convert Uppercase column to lowercase adding addition field ‘new’
Need help to iterate on header and get the output.
I am still learning, Correct me in case for any mistakes 🙂
Answers:
You question is unclear, but I imagine you might want:
out = (
df.isna().any() # check if any NaN per col
.rename_axis('item') # set index name
.reset_index(name='is_null') # set flag name
.assign(seq=lambda x: range(2, 2*len(x)+2, 2)) # assign counter * 2
.to_dict('records') # convert to dictionary
)
Or, using a list comprehension:
out = [{'item': col, 'is_null': df[col].isna().any(), 'seq': i*2}
for i, col in enumerate(df, start=1)]
Or:
out = [{'item': col, 'is_null': flag, 'seq': i*2}
for i, (col, flag) in enumerate(df.isna().any().items(), start=1)]
Output:
[{'item': 'id', 'is_null': False, 'seq': 2},
{'item': 'name', 'is_null': True, 'seq': 4},
{'item': 'location', 'is_null': True, 'seq': 6}]
Firstly, I believe you should have added some kind of a code along with the question, which shows that you tried.
Now as for your query,
What you can do is –
- get a list of columns in dataframe by using df.columns and cast it to list.
- then iterate through this list and you will have values for your item key, seq using simple things, as for your is_null just do – df[col].isnull().values.any() [So if any of the values is null, this will return True else False].
Let me know if this helps, then you can edit by posting some try, and if you still don’t get it, We can see the code
EDIT: I have given a simple iterative way, But the first solution answered by mozway definitely is the best way to go.
You can use list comprehension for this:
[{"item": c, "is_null": forms[c].isnull().values.any(), "seq": i * 2} for i, c in enumerate(df.columns, start=1)]
It loops through each column sets name of the column for item
key, checks is any null values exist in that column and sets it as is_null
and finally the seq
Something like the following would work.
import pandas as pd
# Dataframe
df = pd.DataFrame({'col1': [1, 2, 3, None, 5],
'col2': ['a', 'b', 'c', None, 'e'],
'col3': [True, False, True, True, None]})
# create a list of dictionaries with NaN value status and column index
null_list = [{'item': col, 'is_null': df[col].isnull().any(), 'seq': i*2}
for i, col in enumerate(df.columns)]
print(null_list)
Output:
[
{'item': 'col1', 'is_null': True, 'seq': 0},
{'item': 'col2', 'is_null': True, 'seq': 2},
{'item': 'col3', 'is_null': True, 'seq': 4}
]
I am trying to create a list of dictionaries iterating over CSV header.
For each column we need to check for empty cell and update the field as True/False
I have loaded the csv to a pandas dataframe and created a list of column.
Input CSV data:
id | NAME | location |
---|---|---|
1 | Henry | London |
2 | Joe | Peru |
3 | NaN | Germany |
4 | Smith | NaN |
Output:
- item= Column name
- seq= i*2
- is_null = for any empty cell,True else
False
[{item : id
is_null: False
seq : 2
verson: done}
{item : NAME
new: name
is_null: True
seq : 4
verson: done}
{item : location
is_null: True
seq : 6
verson: done}]
UPDATED the output:
1.Added extra field verson: done for all dict.
2. convert Uppercase column to lowercase adding addition field ‘new’
Need help to iterate on header and get the output.
I am still learning, Correct me in case for any mistakes 🙂
You question is unclear, but I imagine you might want:
out = (
df.isna().any() # check if any NaN per col
.rename_axis('item') # set index name
.reset_index(name='is_null') # set flag name
.assign(seq=lambda x: range(2, 2*len(x)+2, 2)) # assign counter * 2
.to_dict('records') # convert to dictionary
)
Or, using a list comprehension:
out = [{'item': col, 'is_null': df[col].isna().any(), 'seq': i*2}
for i, col in enumerate(df, start=1)]
Or:
out = [{'item': col, 'is_null': flag, 'seq': i*2}
for i, (col, flag) in enumerate(df.isna().any().items(), start=1)]
Output:
[{'item': 'id', 'is_null': False, 'seq': 2},
{'item': 'name', 'is_null': True, 'seq': 4},
{'item': 'location', 'is_null': True, 'seq': 6}]
Firstly, I believe you should have added some kind of a code along with the question, which shows that you tried.
Now as for your query,
What you can do is –
- get a list of columns in dataframe by using df.columns and cast it to list.
- then iterate through this list and you will have values for your item key, seq using simple things, as for your is_null just do – df[col].isnull().values.any() [So if any of the values is null, this will return True else False].
Let me know if this helps, then you can edit by posting some try, and if you still don’t get it, We can see the code
EDIT: I have given a simple iterative way, But the first solution answered by mozway definitely is the best way to go.
You can use list comprehension for this:
[{"item": c, "is_null": forms[c].isnull().values.any(), "seq": i * 2} for i, c in enumerate(df.columns, start=1)]
It loops through each column sets name of the column for item
key, checks is any null values exist in that column and sets it as is_null
and finally the seq
Something like the following would work.
import pandas as pd
# Dataframe
df = pd.DataFrame({'col1': [1, 2, 3, None, 5],
'col2': ['a', 'b', 'c', None, 'e'],
'col3': [True, False, True, True, None]})
# create a list of dictionaries with NaN value status and column index
null_list = [{'item': col, 'is_null': df[col].isnull().any(), 'seq': i*2}
for i, col in enumerate(df.columns)]
print(null_list)
Output:
[
{'item': 'col1', 'is_null': True, 'seq': 0},
{'item': 'col2', 'is_null': True, 'seq': 2},
{'item': 'col3', 'is_null': True, 'seq': 4}
]