How to add a list of irregular dictionaries to DataFrame
Question:
I am trying to add a list of dictionaries information to DataFrame, but I don’t know the way.
For example, I have a DataFrame shown below.
Name City Age
0 John NY 25
1 Ken London 32
2 Smith Boston 29
3 Kate York 21
4 Tom Paris 42
At the same time, I have a list of dictionaries shown below.
[{'A': 15, 'B': 35, 'D': 10},
{'C': 124, 'E': 36},
{'A': 3, 'F': 10},
{},
{'B': 4, 'A': 8, 'C': 1}]
Each dictiory is related to each row of the DataFrame above.
For example, the first dictionary information is related to the first row of the DataFrame.
Thus, I’d like to add the list information to the DataFrame to make the modified DataFrame below, but I don’t know the way. I would be grateful if anyone tell me how to write the codes to summarise the infromation. (I made the Data below manually.)
Name City Age A B C D E F
0 John NY 25 15 35 0 10 0 0
1 Ken London 32 0 0 124 0 36 0
2 Smith Boston 29 3 0 0 0 0 10
3 Kate York 21 0 0 0 0 0 0
4 Tom Paris 42 8 4 1 0 0 0
The points that I think are difficult are:
- Each dictionary has the different length. In the real DataFrame I’d like to analyse, there are a variety of keys in each dictionary while some dictionaries are empty.
- Some keys appear several times in different dictionaries like "A", "B", and "C" above. In these cases, I’d like to use only one "A", "B", or "C" column by summarising the information.
- The DataFrame example has only five rows and the list has only five dictionaries, so I was able to summarise the information manually. However, the real DataFrame and list I’d like to analyse have huge rows and dictionaries, so it is impossible to organise the information without writing codes.
I looked for the same question online and wrote codes by myself, but I was not able to find the way. I would like to know the codes which solve my problem.
Answers:
To convert the irregular list of dictionaries (let’s name it ild
) to a dataframe on its own, use
df2 = pd.DataFrame(ild, dtype=object).fillna(0).astype(int)
After that you only have to append the columns of df2
to the other dataframe.
The code first creates a dataframe from the ild
. Pandas is smart enough to do the most work alone, missing data is filled with NaN
. Without dtype=object
it would automatically use floats (as int doesn’t have a NaN
value) which could introduce rounding errors.
The NaN
are then replaced by zeros with fillna
and the int objects are finally converted to integers with astype
.
I am trying to add a list of dictionaries information to DataFrame, but I don’t know the way.
For example, I have a DataFrame shown below.
Name City Age
0 John NY 25
1 Ken London 32
2 Smith Boston 29
3 Kate York 21
4 Tom Paris 42
At the same time, I have a list of dictionaries shown below.
[{'A': 15, 'B': 35, 'D': 10},
{'C': 124, 'E': 36},
{'A': 3, 'F': 10},
{},
{'B': 4, 'A': 8, 'C': 1}]
Each dictiory is related to each row of the DataFrame above.
For example, the first dictionary information is related to the first row of the DataFrame.
Thus, I’d like to add the list information to the DataFrame to make the modified DataFrame below, but I don’t know the way. I would be grateful if anyone tell me how to write the codes to summarise the infromation. (I made the Data below manually.)
Name City Age A B C D E F
0 John NY 25 15 35 0 10 0 0
1 Ken London 32 0 0 124 0 36 0
2 Smith Boston 29 3 0 0 0 0 10
3 Kate York 21 0 0 0 0 0 0
4 Tom Paris 42 8 4 1 0 0 0
The points that I think are difficult are:
- Each dictionary has the different length. In the real DataFrame I’d like to analyse, there are a variety of keys in each dictionary while some dictionaries are empty.
- Some keys appear several times in different dictionaries like "A", "B", and "C" above. In these cases, I’d like to use only one "A", "B", or "C" column by summarising the information.
- The DataFrame example has only five rows and the list has only five dictionaries, so I was able to summarise the information manually. However, the real DataFrame and list I’d like to analyse have huge rows and dictionaries, so it is impossible to organise the information without writing codes.
I looked for the same question online and wrote codes by myself, but I was not able to find the way. I would like to know the codes which solve my problem.
To convert the irregular list of dictionaries (let’s name it ild
) to a dataframe on its own, use
df2 = pd.DataFrame(ild, dtype=object).fillna(0).astype(int)
After that you only have to append the columns of df2
to the other dataframe.
The code first creates a dataframe from the ild
. Pandas is smart enough to do the most work alone, missing data is filled with NaN
. Without dtype=object
it would automatically use floats (as int doesn’t have a NaN
value) which could introduce rounding errors.
The NaN
are then replaced by zeros with fillna
and the int objects are finally converted to integers with astype
.