Python Duplicate words

Question:

I have a question where I have to count the duplicate words in Python (v3.4.1) and put them in a sentence. I used counter but I don’t know how to get the output in this following order. The input is:

mysentence = As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality

I made this into a list and sorted it

The output is suppose to be this

"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.

I have come to this point so far

x=input ('Enter your sentence :')
y=x.split()
y.sort()
for y in sorted(y):
    print (y)
Asked By: Erwy Lionel

||

Answers:

I can see where you are going with sort, as you can reliably know when you have hit a new word and keep track of counts for each unique word. However, what you really want to do is use a hash (dictionary) to keep track of the counts as dictionary keys are unique. For example:

words = sentence.split()
counts = {}
for word in words:
    if word not in counts:
        counts[word] = 0
    counts[word] += 1

Now that will give you a dictionary where the key is the word and the value is the number of times it appears. There are things you can do like using collections.defaultdict(int) so you can just add the value:

counts = collections.defaultdict(int)
for word in words:
    counts[word] += 1

But there is even something better than that… collections.Counter which will take your list of words and turn it into a dictionary (an extension of dictionary actually) containing the counts.

counts = collections.Counter(words)

From there you want the list of words in sorted order with their counts so you can print them. items() will give you a list of tuples, and sorted will sort (by default) by the first item of each tuple (the word in this case)… which is exactly what you want.

import collections
sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
words = sentence.split()
word_counts = collections.Counter(words)
for word, count in sorted(word_counts.items()):
    print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))

OUTPUT

"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.
Answered By: sberry

Here is a very bad example of doing this without using anything other than lists:

x = "As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"
words = x.split(" ")
words.sort()

words_copied = x.split(" ")
words_copied.sort()

for word in words:
    count = 0
    while(True):
        try:
            index = words_copied.index(word)
            count += 1
            del words_copied[index]
        except ValueError:
            if count is not 0:
                print(word + " is repeated " + str(count) + " times.")
            break

EDIT: Here is a much better way:

x = "As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"
words = x.split(" ")
words.sort()

last_word = ""
for word in words:
    if word != last_word:
        count = [i for i, w in enumerate(words) if w == word]
        print(word + " is repeated " + str(len(count)) + " times.")
    last_word = word
Answered By: isamert

To print word duplicates from a string in the sorted order:

from itertools import groupby 

mysentence = ("As far as the laws of mathematics refer to reality "
              "they are not certain as far as they are certain "
              "they do not refer to reality")
words = mysentence.split() # get a list of whitespace-separated words
for word, duplicates in groupby(sorted(words)): # sort and group duplicates
    count = len(list(duplicates)) # count how many times the word occurs
    print('"{word}" is repeated {count} time{s}'.format(
            word=word, count=count,  s='s'*(count > 1)))

Output

"As" is repeated 1 time
"are" is repeated 2 times
"as" is repeated 3 times
"certain" is repeated 2 times
"do" is repeated 1 time
"far" is repeated 2 times
"laws" is repeated 1 time
"mathematics" is repeated 1 time
"not" is repeated 2 times
"of" is repeated 1 time
"reality" is repeated 2 times
"refer" is repeated 2 times
"the" is repeated 1 time
"they" is repeated 3 times
"to" is repeated 2 times
Answered By: jfs

Hey i have tried it on python 2.7(mac) as i have that version so try to get hold of the logic

from collections import Counter

mysentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""

mysentence = dict(Counter(mysentence.split()))
for i in sorted(mysentence.keys()):
    print ('"'+i+'" is repeated '+str(mysentence[i])+' time.')

I hope this is what you are looking for if not then ping me up happy to learn something new.

"As" is repeated 1 time.
"are" is repeated 2 time.
"as" is repeated 3 time.
"certain" is repeated 2 time.
"do" is repeated 1 time.
"far" is repeated 2 time.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 time.
"of" is repeated 1 time.
"reality" is repeated 2 time.
"refer" is repeated 2 time.
"the" is repeated 1 time.
"they" is repeated 3 time.
"to" is repeated 2 time.
Answered By: HimanshuGahlot

A solution based on numpy array and based on post How do I count the occurrence of a certain item in an ndarray?:

mysentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
import numpy as np
mysentence = np.array(mysentence.split(" "))
words, frq = np.unique(mysentence, return_counts=True)

for item in zip(words,frq):                  
    print(f'"{item[0]}" is repeated {item[1]} time.')

Output:

"As" is repeated 1 time.
"are" is repeated 2 time.
"as" is repeated 3 time.
"certain" is repeated 2 time.
"do" is repeated 1 time.
"far" is repeated 2 time.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 time.
"of" is repeated 1 time.
"reality" is repeated 2 time.
"refer" is repeated 2 time.
"the" is repeated 1 time.
"they" is repeated 3 time.
"to" is repeated 2 time.
Answered By: Sam S.

If string is "miamimiamimiamimiamimiamimiamimiamimiami" or "San FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan Francisco"

import re

String="San FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan Francisco"
word=""
for i in String:
    word+=i
    if String=="".join(re.findall(word,String)):
        print(a)
        break
Answered By: Sk337