Duplicate word Python
Question:
I want to make a program that detects duplicate words as in the following example:
"At least one value must be entered
entered in order to compute the average"
We can see that "entered" is repeated and I want to find a way to detect this type of cases.
archivo = str(input("Ingrese la ubicación del archivo: "))
inf = open(archivo, "r")
lineas = inf.readlines()
lin = []
for a in lineas:
lin.append(a.strip())
cadena = ' '.join([str(item) for item in lin])
list_cadena = cadena.split()
I have done this but I don’t know how to detect the repeated words because they can be in the same line or it can happen that one is at the end of a line of text and another at the beginning of the next, as in the example
Answers:
text = 'i like donkey donkey'
words = text.split(' ')
for i in range(0, len(words)):
count = 1;
for j in range(i+1, len(words)):
if(words[i] == (words[j])):
count = count + 1;
words[j] = '0';
if(count > 1 and words[i] != '0'):
print(words[i]);
# output -> donkey
this code uses a for-loop to check against all words when splitting the string by every space. then it prints them out, obv you can change it to do what ever.
string = 'At least one value must be entered entered in order to compute the average'
string_list = string.split(' ')
for i in range(len(string_list)):
duplicate = string_list.count(string_list[i])
if duplicate > 1: # 2 or more
# heureka = duplicate
print(f'Duplicate word {string_list[i]} at position {i}')
output:
Duplicate word entered at position 6
Duplicate word entered at position 7
Using itertools.pairwse
(python ≥ 3.10):
[a for a,b in pairwise(text.split()) if a==b]
NB. For python below 3.10, you can import the pairwise
recipe
Input:
text = """At least one one value must be entered
entered in order to compute the average"""
Output: ['one', 'entered']
str.strip()
is to remove whitespace. You need str.split()
instead, to separate the words into a list. To get a flat list of all the words, across all the lines, use extend()
instead of append()
when you build the list (otherwise you would get a list of lists). A with
statement is useful here, so that you don’t have to close the file manually.
When you have the list of words, you can just iterate over it and compare each word to the previous one, triggering some action (e.g. a print output) if they are the same:
archivo = input("Ingrese la ubicación del archivo: ")
with open(archivo, "r") as inf:
lineas = inf.readlines()
lin = []
for a in lineas:
lin.extend(a.split())
for i in range(1, len(lin)):
if lin[i - 1] == lin[i]:
print(f'Duplicated word: "{lin[i]}" at index {i}.')
When I save your example
At least one value must be entered
entered in order to compute the average
as a text file, run the code above and enter the file name as input, the output is:
Duplicated word: "entered" at index 7.
I want to make a program that detects duplicate words as in the following example:
"At least one value must be entered
entered in order to compute the average"
We can see that "entered" is repeated and I want to find a way to detect this type of cases.
archivo = str(input("Ingrese la ubicación del archivo: "))
inf = open(archivo, "r")
lineas = inf.readlines()
lin = []
for a in lineas:
lin.append(a.strip())
cadena = ' '.join([str(item) for item in lin])
list_cadena = cadena.split()
I have done this but I don’t know how to detect the repeated words because they can be in the same line or it can happen that one is at the end of a line of text and another at the beginning of the next, as in the example
text = 'i like donkey donkey'
words = text.split(' ')
for i in range(0, len(words)):
count = 1;
for j in range(i+1, len(words)):
if(words[i] == (words[j])):
count = count + 1;
words[j] = '0';
if(count > 1 and words[i] != '0'):
print(words[i]);
# output -> donkey
this code uses a for-loop to check against all words when splitting the string by every space. then it prints them out, obv you can change it to do what ever.
string = 'At least one value must be entered entered in order to compute the average'
string_list = string.split(' ')
for i in range(len(string_list)):
duplicate = string_list.count(string_list[i])
if duplicate > 1: # 2 or more
# heureka = duplicate
print(f'Duplicate word {string_list[i]} at position {i}')
output:
Duplicate word entered at position 6
Duplicate word entered at position 7
Using itertools.pairwse
(python ≥ 3.10):
[a for a,b in pairwise(text.split()) if a==b]
NB. For python below 3.10, you can import the pairwise
recipe
Input:
text = """At least one one value must be entered
entered in order to compute the average"""
Output: ['one', 'entered']
str.strip()
is to remove whitespace. You need str.split()
instead, to separate the words into a list. To get a flat list of all the words, across all the lines, use extend()
instead of append()
when you build the list (otherwise you would get a list of lists). A with
statement is useful here, so that you don’t have to close the file manually.
When you have the list of words, you can just iterate over it and compare each word to the previous one, triggering some action (e.g. a print output) if they are the same:
archivo = input("Ingrese la ubicación del archivo: ")
with open(archivo, "r") as inf:
lineas = inf.readlines()
lin = []
for a in lineas:
lin.extend(a.split())
for i in range(1, len(lin)):
if lin[i - 1] == lin[i]:
print(f'Duplicated word: "{lin[i]}" at index {i}.')
When I save your example
At least one value must be entered
entered in order to compute the average
as a text file, run the code above and enter the file name as input, the output is:
Duplicated word: "entered" at index 7.