Add only unique values to a list in python

Question:

I’m trying to learn python. Here is the relevant part of the exercise:

For each word, check to see if the word is already in a list. If the
word is not in the list, add it to the list.

Here is what I’ve got.

fhand = open('romeo.txt')
output = []

for line in fhand:
    words = line.split()
    for word in words:
        if word is not output:
            output.append(word)

print sorted(output)

Here is what I get.

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and',
 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is',
 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun',
 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

Note duplication (and, is, sun, etc).

How do I get only unique values?

Asked By: Tim Elhajj

||

Answers:

Instead of is not operator, you should use not in operator to check whether the item is in the list:

if word not in output:

BTW, using set is a lot efficient (See Time complexity):

with open('romeo.txt') as fhand:
    output = set()
    for line in fhand:
        words = line.split()
        output.update(words)

UPDATE The set does not preserve the original order. To preserve the order, use the set as an auxiliary data structure:

output = []
seen = set()
with open('romeo.txt') as fhand:
    for line in fhand:
        words = line.split()
        for word in words:
            if word not in seen:  # faster than `word not in output`
                seen.add(word)
                output.append(word)
Answered By: falsetru

To eliminate duplicates from a list, you can maintain an auxiliary list and check against.

myList = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 
     'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 
     'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 
     'through', 'what', 'window', 'with', 'yonder']

auxiliaryList = []
for word in myList:
    if word not in auxiliaryList:
        auxiliaryList.append(word)

output:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 
  'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick',
  'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

This is very simple to comprehend and code is self explanatory. However, code simplicity comes on the expense of code efficiency as linear scans over a growing list makes a linear algorithm degrade to quadratic.


If the order is not important, you could use set()

A set object is an unordered collection of distinct hashable objects.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

Since the average case for membership checking in a hash-table is O(1), using a set is more efficient.

auxiliaryList = list(set(myList))

output:

['and', 'envious', 'already', 'fair', 'is', 'through', 'pale', 'yonder', 
 'what', 'sun', 'Who', 'But', 'moon', 'window', 'sick', 'east', 'breaks', 
 'grief', 'with', 'light', 'It', 'Arise', 'kill', 'the', 'soft', 'Juliet']
Answered By: Tony Tannous

Here’s a “one-liner” which uses this implementation of removing duplicates while preserving order:

def unique(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

output = unique([word for line in fhand for word in line.split()])

The last line flattens fhand into a list of words, and then calls unique() on the resulting list.

Answered By: Mateen Ulhaq

One method is to see if it’s in the list prior to adding, which is what Tony’s answer does. If you want to delete duplicate values after the list has been created, you can use set() to convert the existing list into a set of unique values, and then use list() to convert it into a list again. All in just one line:

list(set(output))

If you want to sort alphabetically, just add a sorted() to the above. Here’s the result:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 
 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 
 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
Answered By: Advait Saravade

fh = open('romeo.txt')
content = fh.read()
words = content.split()

mylist = list()
for word in words:
    if word not in mylist:
        mylist.append(word)

mylist.sort()
print(mylist)

fh.close()

Answered By: 7ud02
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.