Concatenate elements of a tuple in a list in python

Question:

I have a list of tuples that has strings in it
For instance:

[('this', 'is', 'a', 'foo', 'bar', 'sentences')
('is', 'a', 'foo', 'bar', 'sentences', 'and')
('a', 'foo', 'bar', 'sentences', 'and', 'i')
('foo', 'bar', 'sentences', 'and', 'i', 'want')
('bar', 'sentences', 'and', 'i', 'want', 'to')
('sentences', 'and', 'i', 'want', 'to', 'ngramize')
('and', 'i', 'want', 'to', 'ngramize', 'it')]

Now I wish to concatenate each string in a tuple to create a list of space separated strings.
I used the following method:

NewData=[]
for grams in sixgrams:
       NewData.append( (''.join([w+' ' for w in grams])).strip())

which is working perfectly fine.

However, the list that I have has over a million tuples. So my question is that is this method efficient enough or is there some better way to do it.
Thanks.

Asked By: alphacentauri

||

Answers:

The list comprehension creates temporary strings. Just use ' '.join instead.

>>> words_list = [('this', 'is', 'a', 'foo', 'bar', 'sentences'),
...               ('is', 'a', 'foo', 'bar', 'sentences', 'and'),
...               ('a', 'foo', 'bar', 'sentences', 'and', 'i'),
...               ('foo', 'bar', 'sentences', 'and', 'i', 'want'),
...               ('bar', 'sentences', 'and', 'i', 'want', 'to'),
...               ('sentences', 'and', 'i', 'want', 'to', 'ngramize'),
...               ('and', 'i', 'want', 'to', 'ngramize', 'it')]
>>> new_list = []
>>> for words in words_list:
...     new_list.append(' '.join(words)) # <---------------
... 
>>> new_list
['this is a foo bar sentences', 
 'is a foo bar sentences and', 
 'a foo bar sentences and i', 
 'foo bar sentences and i want', 
 'bar sentences and i want to', 
 'sentences and i want to ngramize', 
 'and i want to ngramize it']

Above for loop can be expressed as following list comprehension:

new_list = [' '.join(words) for words in words_list] 
Answered By: falsetru

You can do this efficiently like this

joiner = " ".join
print map(joiner, sixgrams)

We can still improve the performance using list comprehension like this

joiner = " ".join
print [joiner(words) for words in sixgrams]

The performance comparison shows that the above seen list comprehension solution is slightly faster than other two solutions.

from timeit import timeit

joiner = " ".join

def mapSolution():
    return map(joiner, sixgrams)

def comprehensionSolution1():
    return ["".join(words) for words in sixgrams]

def comprehensionSolution2():
    return [joiner(words) for words in sixgrams]

print timeit("mapSolution()", "from __main__ import joiner, mapSolution, sixgrams")
print timeit("comprehensionSolution1()", "from __main__ import sixgrams, comprehensionSolution1, joiner")
print timeit("comprehensionSolution2()", "from __main__ import sixgrams, comprehensionSolution2, joiner")

Output on my machine

1.5691678524
1.66710209846
1.47555398941

The performance gain is most likely because of the fact that, we don’t have to create the join function from the empty string everytime.

Edit: Though we can improve the performance like this, the most pythonic way is to go with generators like in lvc’s answer.

Answered By: thefourtheye

For a lot of data, you should consider whether you need to keep it all in a list. If you are processing each one at a time, you can create a generator that will yield each joined string, but won’t keep them all around taking up memory:

new_data = (' '.join(w) for w in sixgrams)

if you can get the original tuples also from a generator, then you can avoid having the sixgrams list in memory as well.

Answered By: lvc