# String Compression in Python

## Question:

I have the following input :

```
my_list = ["x d1","y d1","z d2","t d2"]
```

And would like to transform it into :

```
Expected_result = ["d1(x,y)","d2(z,t)"]
```

I had to use brute force, and also had to call pandas to my rescue, since I didn’t find any way to do it in plain/vanilla python. Do you have any other way to solve this?

```
import pandas as pd
my_list = ["x d1","y d1","z d2","t d2"]
df = pd.DataFrame(my_list,columns=["col1"])
df2 = df["col1"].str.split(" ",expand = True)
df2.columns = ["col1","col2"]
grp = df2.groupby(["col2"])
result = []
for grp_name, data in grp:
res = grp_name +"(" + ",".join(list(data["col1"])) + ")"
result.append(res)
print(result)
```

## Answers:

- The code defines an empty dictionary.
- It then iterates over each item in your list and uses the
`split()`

method to split item into a`key`

and a`value`

. - Then uses the
`setdefault()`

method to add the`key`

and the`value`

to the empty dictionary. If the`value`

already exists as a`key`

in the dictionary, it appends the`key`

to that value’s existing list of keys. And if the`value`

does not exist as a key in the dictionary, it creates a new key-value pair with the value as the key and the key as the first element in the new list. - Finally, the list comprehension iterates over the items in the dictionary and creates a string for each key-value pair using
`join()`

method to concatenate the keys in the value list into a single string.

```
result = {}
for item in my_list:
key, value = item.split()
result.setdefault(value, []).append(key)
output = [f"{k}({', '.join(v)})" for k, v in result.items()]
print(output)
```

```
['d1(x, y)', 'd2(z, t)']
```

```
my_list = ["x d1","y d1","z d2","t d2"]
res = []
for item in my_list:
a, b, *_ = item.split()
if len(res) and b in res[-1]:
res[-1] = res[-1].replace(')', f',{a})')
else:
res.append(f'{b}({a})')
print(res)
['d1(x,y)', 'd2(z,t)']
```

Let N be the number that follows d, this code works for any number of elements within dN, as long as N is ordered, that is, d1 comes before d2, which comes before d3, … Works with any value of N , and you can use any letter in the d link as long as it has whatever value is in dN and then dN, keeping that order, "val_in_dN dN"

If you need something that works even if the dN are not in sequence, just say the word, but it will cost a little more

**If your values are already sorted by key** (d1, d2), you can use `itertools.groupby`

:

```
from itertools import groupby
out = [f"{k}({','.join(x[0] for x in g)})"
for k, g in groupby(map(str.split, my_list), lambda x: x[1])]
```

Output:

```
['d1(x,y)', 'd2(z,t)']
```

Otherwise you should use a dictionary as shown by @Jamiu.

A variant of your pandas solution:

```
out = (df['col1'].str.split(n=1, expand=True)
.groupby(1)[0]
.apply(lambda g: f"{g.name}({','.join(g)})")
.tolist()
)
```

Another possible solution, which is based on `pandas`

:

```
(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
.groupby('a')['b'].apply(lambda x: f'({x.values[0]}, {x.values[1]})')
.reset_index().sum(axis=1).tolist())
```

Output:

```
['d1(x, y)', 'd2(z, t)']
```

**EDIT**

The OP, @ShckTchamna, would like to see the above solution modified, in order to be more general: The reason of this edit is to provide a solution that works with the example the OP gives in his comment below.

```
my_list = ["x d1","y d1","z d2","t d2","kk d2","m d3", "n d3", "s d4"]
(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
.groupby('a')['b'].apply(lambda x: f'({",".join(x.values)})')
.reset_index().sum(axis=1).tolist())
```

Output:

```
['d1(x,y)', 'd2(z,t,kk)', 'd3(m,n)', 'd4(s)']
```

```
import pandas as pd
df = pd.DataFrame(data=[e.split(' ') for e in ["x d1","y d1","z d2","t d2"]])
r = (df.groupby(1)
.apply(lambda r:"{0}({1},{2})".format(r.iloc[0,1], r.iloc[0,0], r.iloc[1,0]))
.reset_index()
.rename({1:"points", 0:"coordinates"}, axis=1)
)
print(r.coordinates.tolist())
# ['d1(x,y)', 'd2(z,t)']
print(r)
# points coordinates
# 0 d1 d1(x,y)
# 1 d2 d2(z,t)
```

In replacement of my previous one (that works too) :

```
import itertools as it
my_list = [e.split(' ') for e in ["x d1","y d1","z d2","t d2"]]
r=[]
for key, group in it.groupby(my_list, lambda x: x[1]):
l=[e[0] for e in list(group)]
r.append("{0}({1},{2})".format(key, l[0], l[1]))
print(r)
Output :
['d1(x,y)', 'd2(z,t)']
```