Python – array merge

Question:

Here’s the situation.

I have four arrays like below that have the same length (per array) and matching "id" fields.

How can I merge elements using this matching "id" field?

array_1 = [
  {
    "id": "111",
    "field_1" "some string variables here",
    ...
  },
  {
    "id": "222",
    "field_1" "some string variables here",
    ...
  },
  ...
]

array_2 = [
  {
    "id": "111",
    "field_2" "other string variables here",
    ...
  },
  {
    "id": "222",
    "field_2" "other string variables here",
    ...
  },
  ...
]

...

Expected result:

result_array_after_merge = [
  {
    "id": "111",
    "field_1" "some string variables here",   <-- from array_1
    "field_2" "other string variables here",  <-- from array_2
    ...
  },
  {
    "id": "222",
    "field_1" "some string variables here",   <-- from array_1  
    "field_2" "other string variables here",  <-- from array_2
    ...
  },
  ...
]
Asked By: brucelin

||

Answers:

I would convert the 1st list to a dict where "id"s are the keys and the original list’s elements are values. Then I would iterate the second list and update the values of the dict where the "id" matches. Then I would convert the dict back to a list by just taking the values:

array_1 = [
  {
    "id": "111",
    "field_1": "some string variables here",
  },
  {
    "id": "222",
    "field_1": "some string variables here",
  },
]

array_2 = [
  {
    "id": "111",
    "field_2": "other string variables here",
  },
  {
    "id": "222",
    "field_2": "other string variables here",
  },
]

dict_1 = {item["id"]: item for item in array_1}
for d in array_2:
    dict_1[d["id"]].update(d)

array_3 = list(dict_1.values())

from pprint import pprint
pprint(array_3)

Result:

[{'field_1': 'some string variables here',
  'field_2': 'other string variables here',
  'id': '111'},
 {'field_1': 'some string variables here',
  'field_2': 'other string variables here',
  'id': '222'}]
Answered By: Czaporka

you can create a third list, iterate the second list for each id from the first one and if you find the id, update the third list with the items from both lists.

A better solution – i dont know if it’s possible to change the data format – would be to create a dictionary with the id as the key and the fields as the value like so:

{1111: ["some string variables here", "some string variables here"]}

that would improve performance dramatically.

Answered By: Kroustou

use pandas!

for your data:

array_1 = [
  {
    "id": "111",
    "field_1" :"some string variables here"   
  },
  {
    "id": "222",
    "field_1": "some string variables here"
  }
]

array_2 = [
  {
    "id": "111",
    "field_2": "other string variables here"
  },
  {
    "id": "222",
    "field_2" :"other string variables here"
  }
]

import pandas as pd

##convert the arrays to dataFrames
df1 = pd.DataFrame(array_1)
df2 = pd.DataFrame(array_2)

## merge them on ids:
df_merged = pd.merge(df1, df2, on='id', how='left')

## export back to json friendly format
print(df_merged.to_dict('records'))

gives output:

[{'id': '111',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'},
 {'id': '222',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'}]
Answered By: yulGM

Another scalable approach (don’t matter how many arrays do you have), based on @yulGM answer:

import pandas as pd

list_of_arrays = [array_1, array_2] # list all of your arrays
dfs = map(pd.DataFrame, list_of_arrays)
pd.concat(dfs, axis=1).loc[:, lambda x: ~x.columns.duplicated()].to_dict('records')

This will only work if the number or fields across arrays are identical

Alternatively, for using merge:

from functools import reduce

dfs = map(pd.DataFrame, list_of_arrays)
reduce(lambda left, right: pd.merge(left, right, on='id', how='left'), dfs).to_dict('records')

Both of them results in:

[{'id': '111',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'},
 {'id': '222',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'}]
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.