String Compression in Python

Question:

I have the following input :

 my_list = ["x d1","y d1","z d2","t d2"]

And would like to transform it into :

Expected_result = ["d1(x,y)","d2(z,t)"]

I had to use brute force, and also had to call pandas to my rescue, since I didn’t find any way to do it in plain/vanilla python. Do you have any other way to solve this?

import pandas as pd 

my_list = ["x d1","y d1","z d2","t d2"]

df = pd.DataFrame(my_list,columns=["col1"])

df2 = df["col1"].str.split(" ",expand = True)
df2.columns = ["col1","col2"]
grp = df2.groupby(["col2"])

result = []
for grp_name, data in grp:
  res =  grp_name +"(" + ",".join(list(data["col1"])) + ")"
  result.append(res)
print(result)
Asked By: Shck Tchamna

||

Answers:

  1. The code defines an empty dictionary.
  2. It then iterates over each item in your list and uses the split() method to split item into a key and a value.
  3. Then uses the setdefault() method to add the key and the value to the empty dictionary. If the value already exists as a key in the dictionary, it appends the key to that value’s existing list of keys. And if the value does not exist as a key in the dictionary, it creates a new key-value pair with the value as the key and the key as the first element in the new list.
  4. Finally, the list comprehension iterates over the items in the dictionary and creates a string for each key-value pair using join() method to concatenate the keys in the value list into a single string.
result = {}

for item in my_list:
    key, value = item.split()
    result.setdefault(value, []).append(key)
    
output = [f"{k}({', '.join(v)})" for k, v in result.items()]
print(output)

['d1(x, y)', 'd2(z, t)']
Answered By: Jamiu S.
my_list = ["x d1","y d1","z d2","t d2"]
res = []
 
for item in my_list:

    a, b, *_ = item.split()
 
    if len(res) and b in res[-1]:
            res[-1] = res[-1].replace(')', f',{a})')
    else:
        res.append(f'{b}({a})')

print(res)
['d1(x,y)', 'd2(z,t)']

Let N be the number that follows d, this code works for any number of elements within dN, as long as N is ordered, that is, d1 comes before d2, which comes before d3, … Works with any value of N , and you can use any letter in the d link as long as it has whatever value is in dN and then dN, keeping that order, "val_in_dN dN"

If you need something that works even if the dN are not in sequence, just say the word, but it will cost a little more

Answered By: tavo

If your values are already sorted by key (d1, d2), you can use itertools.groupby:

from itertools import groupby

out = [f"{k}({','.join(x[0] for x in g)})"
       for k, g in groupby(map(str.split, my_list), lambda x: x[1])]

Output:

['d1(x,y)', 'd2(z,t)']

Otherwise you should use a dictionary as shown by @Jamiu.

A variant of your pandas solution:

out = (df['col1'].str.split(n=1, expand=True)
       .groupby(1)[0]
       .apply(lambda g: f"{g.name}({','.join(g)})")
       .tolist()
      )
Answered By: mozway

Another possible solution, which is based on pandas:

(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
 .groupby('a')['b'].apply(lambda x: f'({x.values[0]}, {x.values[1]})')
 .reset_index().sum(axis=1).tolist())

Output:

['d1(x, y)', 'd2(z, t)']

EDIT

The OP, @ShckTchamna, would like to see the above solution modified, in order to be more general: The reason of this edit is to provide a solution that works with the example the OP gives in his comment below.

my_list = ["x d1","y d1","z d2","t d2","kk d2","m d3", "n d3", "s d4"] 

(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
 .groupby('a')['b'].apply(lambda x: f'({",".join(x.values)})')
 .reset_index().sum(axis=1).tolist())

Output:

['d1(x,y)', 'd2(z,t,kk)', 'd3(m,n)', 'd4(s)']
Answered By: PaulS
import pandas as pd

df = pd.DataFrame(data=[e.split(' ') for e in ["x d1","y d1","z d2","t d2"]])
r = (df.groupby(1)
       .apply(lambda r:"{0}({1},{2})".format(r.iloc[0,1], r.iloc[0,0], r.iloc[1,0]))
       .reset_index()
       .rename({1:"points", 0:"coordinates"}, axis=1)
         )

print(r.coordinates.tolist())
# ['d1(x,y)', 'd2(z,t)']

print(r)
#   points coordinates
# 0    d1     d1(x,y)
# 1    d2     d2(z,t)

In replacement of my previous one (that works too) :

import itertools as it

my_list = [e.split(' ') for e in ["x d1","y d1","z d2","t d2"]]

r=[]
for key, group in it.groupby(my_list, lambda x: x[1]):
    l=[e[0] for e in list(group)]
    r.append("{0}({1},{2})".format(key, l[0], l[1]))

print(r)
Output :

['d1(x,y)', 'd2(z,t)']
Answered By: Laurent B.
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.