why part of my code is ruining the other part?

Question:

Hello guys I’m trying to create a program to count the total words and the total unique words in a file but when I run the 2 parts of the codes together only the unique words counter part will work and when I delete the unique words counter part the normal words counter will work normally
here is the code full code

f = open('icuhistory.txt','r')

    wordCount = 0
        text = f.read()
        for line in f:
            lin = line.rstrip()
            wds = line.split()
            wordCount += len(wds) #this section alone works fine
        text = text.lower() #when I start writing this one the first one will stop working
        words = text.split()
        words = [word.strip('.,!;()[]') for word in words]
        words = [word.replace("'s", '') for word in words]
        unique = []
        for word in words:
            if word not in unique:
                unique.append(word)
        unique.sort()
        print("number of words: ",wordCount)
        print("number of unique words: ",len(unique))

Here is the inside of the file

in the fall of 1945 just weeks after the end
of world war ii a group of japanese christian educators
initiated a move to establish a university based on christian
principles the foreign missions conference of north america and the
us education mission both visiting japan at the time
gave their wholehearted support conveying this plan to people in
the us amidst the post-war yearning for reconciliation and
world peace americans supported this project with great enthusiasm in
1948 the japan international christian university foundation jicuf was
established in new york to coordinate fund-raising efforts in the
us people in japan also found hope in a
cause dedicated toworld peace organizations firms and individuals made donations
to this ambitious undertaking regardless of their religious orientation anddespite
the often destitute circumstances in the immediate post-war years bank
of japan governor hisato ichimada headed the supporting organization to
lead the national fund raising drive icu has been unique from
its inception with its endowment procured through good will transcending
national borders 
on june 15 1949 japanese and north american christian leaders
convened at the gotemba ymca camp to establish international christian
university with the inauguration of the board of trustees and
the board of councillors the founding principles and a fundamental
educational plan were laid down establishing an interdenominational christian university
had been a dream of japanese and american christians for
half a century the gotemba conference had finally realized their
aspirations 
in 1950 icu purchased a spacious site in mitaka city
on the outskirts of tokyo with the donations it received
within japan the campus was dedicated on april 29 1952
with the language institute set up in the first year
in march 1953 the japanese ministry of education authorized icu
as an incorporated educational institution the college of liberal arts
opening on april 1 as the first four-year liberal arts
college in japan 
the university celebrated its 50th anniversary in 1999 with diverse
events and projects during the commemorative five year period leading to
march 2004 in 2003 the ministry of education culture sports
science and technology selected icu s research and education
for peace security and conviviality for the 21st century center
of excellence program and its liberal arts to nurture
responsible global citizens for the distinctive university education support program
good practice 
in 2008 an academic reform was enforced in the college
of liberal arts which replaced the system of six divisions
with a new organization of the division of arts
and sciences and a system of academic majors as of
april 2008 all new students simply start as college of
liberal arts students making their choice of major from 31
areas by the end of their sophomore year students now
have more time to make a decision while they study
diverse subjects through general education and foundation courses mext chose
icu for its fiscal year 2007 distinctive university education support
program educational support for liberal arts to nurture international
learning from academic advising to academic planning in acknowledgement of
the university s efforts for educational improvement in 2010 the
graduate school also conducted a reform and integrated the four
divisions into a new school of arts and sciences
icu is continually working to reconfirm its responsibilities and fulfill
its mission for the changing times
Asked By: HAZEM

||

Answers:

Take a look at the text = f.read() line. Is it at the right place?

Also, the Python script you pasted does not have consistent indenting. Are you able to clean it up so that it looks just like the original?

Also curious if you have explored the set type in Python? It is a little like a list, but you may find it applicable in your scenario.

Answered By: jdbow75

Explenation:

Behind files and open stands a concept of streaming or if you are more familiar with iterators think of f = open('icuhistory.txt','r') as an iterator.
You can go through it only once (if you don’t tell it to reset)

text = f.read()

Will go through it once, then f is at the end of the file.

for line in f:

Now tries to continue where f currently is… at the end of the file.
So this loop will try to loop over the 0 lines left at the end.
As there is nothing left to iterate over it will not enter the for loop.


Solutions:

You could reset it with f.seek(0) this will tell the object to go back to the start of the file.


But more efficient would be if you either combine both your actions in the loop (more memory friendly) or work with the text text = f.read()

Answered By: Daraan

The entire file content appears to be lowercase so it’s as easy as this:

result = {}

with open('icuhistory.txt') as icu:
    for word in icu.read().split():
        word = word.strip('.,!;()[]').replace("'s", "")
        result[word] = result.get(word, 0) + 1

print(f'Number of words = {sum(result.values())}')
print(f'Number of unique words = {len(result)}')

Output:

Number of words = 547
Number of unique words = 273
Answered By: OldBill

There’s no need to read by line as you are counting words, also avoid sorting unless it’s needed, as this can be expensive. Converting a list to a set will remove duplicates, and you can chain string methods.

with open('icuhistory.txt','r') as f:
    text = f.read().lower()
words = [word.strip('.,!;()[]').replace("'s", '') for word in text.split()]
unique_words = set(words)
print("number of words: ", len(words))
print("number of unique words: ", len(unique_words))
Answered By: bn_ln
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.