How to use python to merge two lists using common elements?

Question:

I have two lists in python, the first with each element composed by a string and an integer:

delta[0:10]
[('conhecimento', 17),
 ('ciência', 14),
 ('interdisciplinaridade', 13),
 ('saber', 10),
 ('objeto', 10),
 ('pode', 10),
 ('processo', 9),
 ('conceito', 9),
 ('assim', 8),
 ('mundo', 8)]

And a second list composed by a string and a tuple:

echo[0:10]
[('mundo', [2024]),
 ('assim', [2022]),
 ('conceito', [1599, 1602, 1862, 1865]),
 ('processo', [1949, 1963, 1972]),
 ('pode', [2018]),
 ('objeto', [1566, 1605]),
 ('saber', [2016]),
 ('interdisciplinaridade', [2014]),
 ('ciência', [2013,756]),
 ('conhecimento, [2011, 2223])]

Both lists have the same length, because they were made with the same dataset, so they all share the same string elements.

len(echo)
1398

len(delta)
1398

All string elements are present in both lists but in a different order. I need to build a third list where the first index is the common string present in both lists, it also has to be followed by the integer, as in the first list, and the respective tuple, associated with the string that is also present in the second list. In the end, I intend the final merged list to look like this:

 final[0:4]
 [('conhecimento', 17, [2011, 2223]),
  ('ciência', 14, [2013,756]),
  ('interdisciplinaridade', 13, [2014]),
  ('saber', 10, [2016])]

And also, if possible, I want a method to sort the elements of the final list considering the value of the second element and another method to sort these elements considering the highest value of the third element on the final list.

Thanks in advance!

Answers:

You can do this way with iterating one list of tuple and make another list of tuple to dict where you can look up for values and finally append it like below-

delta = [
    ("conhecimento", 17),
    ("ciência", 14),
    ("interdisciplinaridade", 13),
    ("saber", 10),
    ("objeto", 10),
    ("pode", 10),
    ("processo", 9),
    ("conceito", 9),
    ("assim", 8),
    ("mundo", 8),
]

echo = [
    ("mundo", [2024]),
    ("assim", [2022]),
    ("conceito", [1599, 1602, 1862, 1865]),
    ("processo", [1949, 1963, 1972]),
    ("pode", [2018]),
    ("objeto", [1566, 1605]),
    ("saber", [2016]),
    ("interdisciplinaridade", [2014]),
    ("ciência", [2013, 756]),
    ("conhecimento", [2011, 2223]),
]


final = []
lookup = dict(echo)
for a, b in delta:
    final.append((a, b, lookup.get(a)))
print(final)

Output:

[
    ("conhecimento", 17, [2011, 2223]),
    ("ciência", 14, [2013, 756]),
    ("interdisciplinaridade", 13, [2014]),
    ("saber", 10, [2016]),
    ("objeto", 10, [1566, 1605]),
    ("pode", 10, [2018]),
    ("processo", 9, [1949, 1963, 1972]),
    ("conceito", 9, [1599, 1602, 1862, 1865]),
    ("assim", 8, [2022]),
    ("mundo", 8, [2024]),
]
Answered By: Always Sunny

The python pandas package easily solves your issue:

import pandas as pd

# create both dataframes
df1 = pd.DataFrame(
    {'id': ['conhecimento', 'ciência', 'interdisciplinaridade', 'saber'], 'delta': [17, 14, 13, 10]}
)

df2 = pd.DataFrame(
    {'id': ['ciência', 'interdisciplinaridade', 'saber', 'conhecimento'], 'echo': [[2013,756], [2011, 2223], [2016], [2014]]}
)

# merge both dataframes based on "id"
df_merged = df1.merge(df2, how="left", on="id")

# sort merged dataframe based on descending delta value
df_merged = df_merged.sort_values(by=["delta",], ascending=[0,])

# output the final dataframe
df_merged

# output in list form
output = df_merged.values.tolist()
print(output)

output

dataframe
        id  delta   echo
        0   conhecimento    17  [2014]
        1   ciência 14  [2013, 756]
        2   interdisciplinaridade   13  [2011, 2223]
        3   saber   10  [2016]

list
    [['conhecimento', 17, [2014]],
     ['ciência', 14, [2013, 756]],
     ['interdisciplinaridade', 13, [2011, 2223]],
     ['saber', 10, [2016]]]
Answered By: hannez

as was mentioned above pandas is the obvious way to do such a kind of things, this is another approach:

import pandas as pd

res = (pd.concat([pd.Series(dict(delta)),pd.Series(dict(echo))],axis=1)
       .reset_index().values.tolist())

>>> res
'''
[['conhecimento', 17, [2011, 2223]],
 ['ciência', 14, [2013, 756]],
 ['interdisciplinaridade', 13, [2014]],
 ['saber', 10, [2016]],
 ['objeto', 10, [1566, 1605]],
 ['pode', 10, [2018]],
 ['processo', 9, [1949, 1963, 1972]],
 ['conceito', 9, [1599, 1602, 1862, 1865]],
 ['assim', 8, [2022]],
 ['mundo', 8, [2024]]]

it will work till "All string elements are present in both lists"

Answered By: SergFSM