Removing duplicate characters from a string

Question:

How can I remove duplicate characters from a string using Python? For example, let’s say I have a string:

foo = 'mppmt'

How can I make the string:

foo = 'mpt'

NOTE: Order is not important

Asked By: JSW189

||

Answers:

If order does not matter, you can use

"".join(set(foo))

set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.

If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)

foo = "mppmt"
result = "".join(dict.fromkeys(foo))

resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.

Answered By: Sven Marnach

If order is not the matter:

>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'

To keep the order:

>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'
Answered By: kev

If order is important,

seen = set()
result = []
for c in foo:
    if c not in seen:
        result.append(c)
        seen.add(c)
result = ''.join(result)

Or to do it without sets:

result = []
for c in foo:
    if c not in result:
        result.append(c)
result = ''.join(result)
Answered By: Kevin Coffey

If order does matter, how about:

>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'
Answered By: DSM

As was mentioned “”.join(set(foo)) and collections.OrderedDict will do.
A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they’re upper or lower characters.

from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))

prints eugnhsaw

Answered By: Eugene Berezin

Create a list in Python and also a set which doesn’t allow any duplicates.
Solution1 :

def fix(string):
    s = set()
    list = []
    for ch in string:
        if ch not in s:
            s.add(ch)
            list.append(ch)
    
    return ''.join(list)        

string = "Protiijaayiiii"
print(fix(string))

Method 2 :

s = "Protijayi"

aa = [ ch  for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))

Method 3 :

dd = ''.join(dict.fromkeys(a))
print(dd)
Answered By: Soudipta Dutta
#Check code and apply in your Program:

#Input= 'pppmm'    
s = 'ppppmm'
s = ''.join(set(s))  
print(s)
#Output: pm
Answered By: hp_elite
def dupe(str1):
    s=set(str1)

    return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)

works well if order is not important.

Answered By: ravi tanwar
d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
    if c not in d:
      res.append(c)
      d[c]=1
print ("".join(res))

variable ‘c’ traverses through String ‘s’ in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array ‘res’ then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.

Answered By: Tarish
def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var+i
    return var

print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))
Answered By: Abhisek Meshram

As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.

"".join(list(dict.fromkeys(foo)))

Answered By: hrnjan
from collections import OrderedDict
def remove_duplicates(value):
        m=list(OrderedDict.fromkeys(value))
        s=''
        for i in m:
            s+=i
        return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

Answered By: swamy_teja7
 mylist=["ABA", "CAA", "ADA"]
 results=[]
 for item in mylist:
     buffer=[]
     for char in item:
         if char not in buffer:
             buffer.append(char)
     results.append("".join(buffer))
    
 print(results)

 output
 ABA
 CAA
 ADA
 ['AB', 'CA', 'AD']
Answered By: Golden Lion

Functional programming style while keeping order:

import functools

def get_unique_char(a, b):
    if b not in a:
        return a + b
    else:
        return a

if __name__ == '__main__':
    foo = 'mppmt'

    gen = functools.reduce(get_unique_char, foo)
    print(''.join(list(gen)))
Answered By: Olivier_s_j

Using regular expressions:

import re
pattern = r'(.)1+' # (.) any character repeated (+) more than
repl = r'1'        # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)

output:

sh!
Answered By: IndPythCoder

You can replace matches of

rgx = r'(.)(?=.*1)'

with empty strings.

import re

print(re.sub(rgx, '', 'abbcabdeeeafgfh'))
  #=> "cbdeagfh"

Demo

The regular expression matches any character (.), saves it to capture group 1 ((.)) and requires (by the use of the positive lookahead (?=.*1)) that the same character (1) appears later in the string.

In the example, the first and second 'a'‘s are matched, and therefore converted to empty strings, because in each case there is another 'a' later in the string. The third 'a' in the string is not matched because there are no 'a'‘s later in the string.

Answered By: Cary Swoveland
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.