Removing duplicate characters from a string

Question

How can I remove duplicate characters from a string using Python? For example, let’s say I have a string:

foo = 'mppmt'

How can I make the string:

foo = 'mpt'

NOTE: Order is not important

Asked By: JSW189

||

Source

Answer 1

If order does not matter, you can use

"".join(set(foo))

set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.

If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)

foo = "mppmt"
result = "".join(dict.fromkeys(foo))

resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.

Answered By: Sven Marnach

Answer 2

If order is not the matter:

>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'

To keep the order:

>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'

Answered By: kev

Answer 3

If order is important,

seen = set()
result = []
for c in foo:
    if c not in seen:
        result.append(c)
        seen.add(c)
result = ''.join(result)

Or to do it without sets:

result = []
for c in foo:
    if c not in result:
        result.append(c)
result = ''.join(result)

Answered By: Kevin Coffey

Answer 4

If order does matter, how about:

>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'

Answered By: DSM

Answer 5

As was mentioned “”.join(set(foo)) and collections.OrderedDict will do.
A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they’re upper or lower characters.

from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))

prints eugnhsaw

Answered By: Eugene Berezin

Answer 6

Create a list in Python and also a set which doesn’t allow any duplicates.
Solution1 :

def fix(string):
    s = set()
    list = []
    for ch in string:
        if ch not in s:
            s.add(ch)
            list.append(ch)
    
    return ''.join(list)        

string = "Protiijaayiiii"
print(fix(string))

Method 2 :

s = "Protijayi"

aa = [ ch  for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))

Method 3 :

dd = ''.join(dict.fromkeys(a))
print(dd)

Answered By: Soudipta Dutta

Answer 7

#Check code and apply in your Program:

#Input= 'pppmm'

s = 'ppppmm'
s = ''.join(set(s))  
print(s)
#Output: pm

Answered By: hp_elite

Answer 8

def dupe(str1):
    s=set(str1)

    return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)

works well if order is not important.

Answered By: ravi tanwar

Answer 9

d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
    if c not in d:
      res.append(c)
      d[c]=1
print ("".join(res))

variable ‘c’ traverses through String ‘s’ in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array ‘res’ then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.

Answered By: Tarish

Answer 10

def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var+i
    return var

print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

Answered By: Abhisek Meshram

Answer 11

As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.

"".join(list(dict.fromkeys(foo)))

Answered By: hrnjan

Answer 12

from collections import OrderedDict
def remove_duplicates(value):
        m=list(OrderedDict.fromkeys(value))
        s=''
        for i in m:
            s+=i
        return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

Answered By: swamy_teja7

Answer 13

 mylist=["ABA", "CAA", "ADA"]
 results=[]
 for item in mylist:
     buffer=[]
     for char in item:
         if char not in buffer:
             buffer.append(char)
     results.append("".join(buffer))
    
 print(results)

 output
 ABA
 CAA
 ADA
 ['AB', 'CA', 'AD']

Answered By: Golden Lion

Answer 14

Functional programming style while keeping order:

import functools

def get_unique_char(a, b):
    if b not in a:
        return a + b
    else:
        return a

if __name__ == '__main__':
    foo = 'mppmt'

    gen = functools.reduce(get_unique_char, foo)
    print(''.join(list(gen)))

Answered By: Olivier_s_j

Answer 15

Using regular expressions:

import re
pattern = r'(.)1+' # (.) any character repeated (+) more than
repl = r'1'        # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)

output:

sh!

Answered By: IndPythCoder

Answer 16

You can replace matches of

rgx = r'(.)(?=.*1)'

with empty strings.

import re

print(re.sub(rgx, '', 'abbcabdeeeafgfh'))
  #=> "cbdeagfh"

Demo

The regular expression matches any character (.), saves it to capture group 1 ((.)) and requires (by the use of the positive lookahead (?=.*1)) that the same character (1) appears later in the string.

In the example, the first and second 'a'‘s are matched, and therefore converted to empty strings, because in each case there is another 'a' later in the string. The third 'a' in the string is not matched because there are no 'a'‘s later in the string.

Answered By: Cary Swoveland

Removing duplicate characters from a string

Question:

Answers: