Python – array merge
Question:
Here’s the situation.
I have four arrays like below that have the same length (per array) and matching "id" fields.
How can I merge elements using this matching "id" field?
array_1 = [
{
"id": "111",
"field_1" "some string variables here",
...
},
{
"id": "222",
"field_1" "some string variables here",
...
},
...
]
array_2 = [
{
"id": "111",
"field_2" "other string variables here",
...
},
{
"id": "222",
"field_2" "other string variables here",
...
},
...
]
...
Expected result:
result_array_after_merge = [
{
"id": "111",
"field_1" "some string variables here", <-- from array_1
"field_2" "other string variables here", <-- from array_2
...
},
{
"id": "222",
"field_1" "some string variables here", <-- from array_1
"field_2" "other string variables here", <-- from array_2
...
},
...
]
Answers:
I would convert the 1st list to a dict where "id"s are the keys and the original list’s elements are values. Then I would iterate the second list and update the values of the dict where the "id" matches. Then I would convert the dict back to a list by just taking the values:
array_1 = [
{
"id": "111",
"field_1": "some string variables here",
},
{
"id": "222",
"field_1": "some string variables here",
},
]
array_2 = [
{
"id": "111",
"field_2": "other string variables here",
},
{
"id": "222",
"field_2": "other string variables here",
},
]
dict_1 = {item["id"]: item for item in array_1}
for d in array_2:
dict_1[d["id"]].update(d)
array_3 = list(dict_1.values())
from pprint import pprint
pprint(array_3)
Result:
[{'field_1': 'some string variables here',
'field_2': 'other string variables here',
'id': '111'},
{'field_1': 'some string variables here',
'field_2': 'other string variables here',
'id': '222'}]
you can create a third list, iterate the second list for each id from the first one and if you find the id, update the third list with the items from both lists.
A better solution – i dont know if it’s possible to change the data format – would be to create a dictionary with the id as the key and the fields as the value like so:
{1111: ["some string variables here", "some string variables here"]}
that would improve performance dramatically.
use pandas!
for your data:
array_1 = [
{
"id": "111",
"field_1" :"some string variables here"
},
{
"id": "222",
"field_1": "some string variables here"
}
]
array_2 = [
{
"id": "111",
"field_2": "other string variables here"
},
{
"id": "222",
"field_2" :"other string variables here"
}
]
import pandas as pd
##convert the arrays to dataFrames
df1 = pd.DataFrame(array_1)
df2 = pd.DataFrame(array_2)
## merge them on ids:
df_merged = pd.merge(df1, df2, on='id', how='left')
## export back to json friendly format
print(df_merged.to_dict('records'))
gives output:
[{'id': '111',
'field_1': 'some string variables here',
'field_2': 'other string variables here'},
{'id': '222',
'field_1': 'some string variables here',
'field_2': 'other string variables here'}]
Another scalable approach (don’t matter how many arrays do you have), based on @yulGM answer:
import pandas as pd
list_of_arrays = [array_1, array_2] # list all of your arrays
dfs = map(pd.DataFrame, list_of_arrays)
pd.concat(dfs, axis=1).loc[:, lambda x: ~x.columns.duplicated()].to_dict('records')
This will only work if the number or fields across arrays are identical
Alternatively, for using merge:
from functools import reduce
dfs = map(pd.DataFrame, list_of_arrays)
reduce(lambda left, right: pd.merge(left, right, on='id', how='left'), dfs).to_dict('records')
Both of them results in:
[{'id': '111',
'field_1': 'some string variables here',
'field_2': 'other string variables here'},
{'id': '222',
'field_1': 'some string variables here',
'field_2': 'other string variables here'}]
Here’s the situation.
I have four arrays like below that have the same length (per array) and matching "id" fields.
How can I merge elements using this matching "id" field?
array_1 = [
{
"id": "111",
"field_1" "some string variables here",
...
},
{
"id": "222",
"field_1" "some string variables here",
...
},
...
]
array_2 = [
{
"id": "111",
"field_2" "other string variables here",
...
},
{
"id": "222",
"field_2" "other string variables here",
...
},
...
]
...
Expected result:
result_array_after_merge = [
{
"id": "111",
"field_1" "some string variables here", <-- from array_1
"field_2" "other string variables here", <-- from array_2
...
},
{
"id": "222",
"field_1" "some string variables here", <-- from array_1
"field_2" "other string variables here", <-- from array_2
...
},
...
]
I would convert the 1st list to a dict where "id"s are the keys and the original list’s elements are values. Then I would iterate the second list and update the values of the dict where the "id" matches. Then I would convert the dict back to a list by just taking the values:
array_1 = [
{
"id": "111",
"field_1": "some string variables here",
},
{
"id": "222",
"field_1": "some string variables here",
},
]
array_2 = [
{
"id": "111",
"field_2": "other string variables here",
},
{
"id": "222",
"field_2": "other string variables here",
},
]
dict_1 = {item["id"]: item for item in array_1}
for d in array_2:
dict_1[d["id"]].update(d)
array_3 = list(dict_1.values())
from pprint import pprint
pprint(array_3)
Result:
[{'field_1': 'some string variables here',
'field_2': 'other string variables here',
'id': '111'},
{'field_1': 'some string variables here',
'field_2': 'other string variables here',
'id': '222'}]
you can create a third list, iterate the second list for each id from the first one and if you find the id, update the third list with the items from both lists.
A better solution – i dont know if it’s possible to change the data format – would be to create a dictionary with the id as the key and the fields as the value like so:
{1111: ["some string variables here", "some string variables here"]}
that would improve performance dramatically.
use pandas!
for your data:
array_1 = [
{
"id": "111",
"field_1" :"some string variables here"
},
{
"id": "222",
"field_1": "some string variables here"
}
]
array_2 = [
{
"id": "111",
"field_2": "other string variables here"
},
{
"id": "222",
"field_2" :"other string variables here"
}
]
import pandas as pd
##convert the arrays to dataFrames
df1 = pd.DataFrame(array_1)
df2 = pd.DataFrame(array_2)
## merge them on ids:
df_merged = pd.merge(df1, df2, on='id', how='left')
## export back to json friendly format
print(df_merged.to_dict('records'))
gives output:
[{'id': '111',
'field_1': 'some string variables here',
'field_2': 'other string variables here'},
{'id': '222',
'field_1': 'some string variables here',
'field_2': 'other string variables here'}]
Another scalable approach (don’t matter how many arrays do you have), based on @yulGM answer:
import pandas as pd
list_of_arrays = [array_1, array_2] # list all of your arrays
dfs = map(pd.DataFrame, list_of_arrays)
pd.concat(dfs, axis=1).loc[:, lambda x: ~x.columns.duplicated()].to_dict('records')
This will only work if the number or fields across arrays are identical
Alternatively, for using merge:
from functools import reduce
dfs = map(pd.DataFrame, list_of_arrays)
reduce(lambda left, right: pd.merge(left, right, on='id', how='left'), dfs).to_dict('records')
Both of them results in:
[{'id': '111',
'field_1': 'some string variables here',
'field_2': 'other string variables here'},
{'id': '222',
'field_1': 'some string variables here',
'field_2': 'other string variables here'}]