Duplicate word Python

Question:

I want to make a program that detects duplicate words as in the following example:
"At least one value must be entered
entered in order to compute the average"
We can see that "entered" is repeated and I want to find a way to detect this type of cases.

archivo = str(input("Ingrese la ubicación del archivo: "))
inf = open(archivo, "r")

lineas = inf.readlines()
lin = []

for a in lineas:
    lin.append(a.strip())
    
cadena = ' '.join([str(item) for item in lin])
list_cadena = cadena.split()

I have done this but I don’t know how to detect the repeated words because they can be in the same line or it can happen that one is at the end of a line of text and another at the beginning of the next, as in the example

Asked By: Tacos al pastor

||

Answers:

text = 'i like donkey donkey'
words = text.split(' ')

for i in range(0, len(words)):  
    count = 1;  
    for j in range(i+1, len(words)):  
        if(words[i] == (words[j])):  
            count = count + 1;  
            words[j] = '0';  
              
    if(count > 1 and words[i] != '0'):  
        print(words[i]);  

# output -> donkey

this code uses a for-loop to check against all words when splitting the string by every space. then it prints them out, obv you can change it to do what ever.

Answered By: muty02
string = 'At least one value must be entered entered in order to compute the average'

string_list = string.split(' ')

for i in range(len(string_list)):
    duplicate = string_list.count(string_list[i])

    if duplicate > 1: # 2 or more
        # heureka = duplicate
        print(f'Duplicate word {string_list[i]} at position {i}')

output:

Duplicate word entered at position 6

Duplicate word entered at position 7

Answered By: Marcel Suleiman

Using itertools.pairwse (python ≥ 3.10):

[a for a,b in pairwise(text.split()) if a==b]

NB. For python below 3.10, you can import the pairwise recipe

Input:

text = """At least one one value must be entered
entered in order to compute the average"""

Output: ['one', 'entered']

Answered By: mozway

str.strip() is to remove whitespace. You need str.split() instead, to separate the words into a list. To get a flat list of all the words, across all the lines, use extend() instead of append() when you build the list (otherwise you would get a list of lists). A with statement is useful here, so that you don’t have to close the file manually.

When you have the list of words, you can just iterate over it and compare each word to the previous one, triggering some action (e.g. a print output) if they are the same:

archivo = input("Ingrese la ubicación del archivo: ")

with open(archivo, "r") as inf:
    lineas = inf.readlines()
    lin = []
    for a in lineas:
        lin.extend(a.split())

for i in range(1, len(lin)):
    if lin[i - 1] == lin[i]:
        print(f'Duplicated word: "{lin[i]}" at index {i}.')

When I save your example

At least one value must be entered
entered in order to compute the average

as a text file, run the code above and enter the file name as input, the output is:

Duplicated word: "entered" at index 7.

Answered By: Arne
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.