readline only print half of the results in a csv file

Question:

As titled, I have a csv file with 6 columns. For NLP processing I need to extract the 6th column(which is a review comment column) and transform it to a list of list of words using NLP.The code below is given by the instructor:

def read_twitter(fname):
    """ Read the given dataset into list and clean stop words. 
    
    Args: 
        fname (string): filename of Twitter Dataset
        
    Returns:
        list of list of words: we view each document as a list, including a list of all words 
    """
    twitter = []
    with open(fname,encoding="utf-8") as f:
        for line in f:
            tweet = f.readline().split(",")[5]
            
            # YOUR CLEANING CODE HERE
            #    - Clean tweet
            #    - Split into list words
            #    - Store list in twitter
            
    return twitter

Then we call the function read_twitter:

twitter = read_twitter('twitter.csv')

It should return some list of lists as required. However, with no codes added to the above part,I’m sure it should return an empty list.But it gives the following error:

IndexError Traceback (most recent call last)
in

~AppDataLocalTempipykernel_157842512851317.py in read_twitter(fname)

 12         for line in f:

 13 

—> 14 tweet = f.readline().split(",")[5]

 15 

 16 

IndexError: list index out of range.

But when I tried to edit the above code and change it to:

def read_twitter(fname):
    """ Read the given dataset into list and clean stop words. 
    
    Args: 
        fname (string): filename of Twitter Dataset
        
    Returns:
        list of list of words: we view each document as a list, including a list of all words 
    """
    twitter = []
    with open(fname,encoding="utf-8") as f:
        for line in f:
            print(f.readline().split(",")[5])
            
    return twitter
twitter = read_twitter('twitter.csv')

It actually has the result but includes only half rows of the dataset. I am quite confused on how this readline() function is doing here and why it kept saying out of range. Any help will be appreciated.

Asked By: Jeffrey

||

Answers:

You are skipping lines by combining a file iteration and readline. for line in f: iterates one line then tweet = f.readline().split(",")[5] reads the next. Just remove the readline.

def read_twitter(fname):
    """ Read the given dataset into list and clean stop words. 
    
    Args: 
        fname (string): filename of Twitter Dataset
        
    Returns:
        list of list of words: we view each document as a list, including a list of all words 
    """
    twitter = []
    with open(fname,encoding="utf-8") as f:
        for line in f:
            tweet = line.split(",")[5]
            
            # YOUR CLEANING CODE HERE
            #    - Clean tweet
            #    - Split into list words
            #    - Store list in twitter
            
    return twitter
Answered By: tdelaney
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.