How to solve "IndexError: list index out of range" error when reading an CSV file

Question:

I am not too sure how to ask, I am new to Python and programming as whole but here is my question. I hope it makes sense..

I am currently working on a populating a postgres database where I have this loop which iterates along with reading a csv file to get a certain output and then Insert/query that into a table. "Basically creating a handle"

But I get an error which I am assuming its due to empty values in the csv files. I have learned Pandas in the past to clean up data but the Github’s code where I extracted this code does not seem to include Pandas to do so.

In the video which is from Youtuber/Engineer "Part Time Larry – Tracking ARK Invest ETFs with Python and PostgreSQL" he obviously uses the exact same code but he does not get the same Index Error. Not sure how to by pass this error, I have read and watched videos but it doesn’t explain this specific scenario

Here is the index list:

['date', 'fund', 'company', 'ticker', 'cusip', 'shares', 'market value ($)', 'weight (%)']

PS. I do understand Generally, the index range of a list is 0 to n-1, with n being the total number of values in the list.

the range is from 0 to 7
making it 8 values in the index list

Here is the Github’s link in case you want to check it out and the YT link
https://github.com/hackingthemarkets/ark-funds-tracker

YT Link:
https://www.youtube.com/watch?v=5uW0TLHQg9w&t=1093s
Time stamp: Minute 11:56

Below are 3 images: the code, the error, and the CSV file at the end

Created a handle from postgres loop along with csv file

I get a portion of the rows but get the Index Error due to empty values I believe

CSV FILE

Asked By: Meru

||

Answers:

If you’re sure that the correct row is with 8 elements, you could add a check before proceeding to print, for example this section:

with open(f"Resources/{current_date}/{etf['symbol']}.csv") as f:
    reader = csv.reaader(f)
    for row in reader:
        if len(row) == 8:    #add this check
            ticker = row[3]
            if ticker:
                print(row)

Edit: To answer your question on how to fix your code, I guess you could start with understanding how list[index] works. For example run the following and see the error happens and understand why it happens:

lis = ['date', 'fund', 'company']
for i in [0, 1, 2]:
    print(i)
    print(lis[i])

for i in [0, 1, 2, 3, 4, 5, 6, 7, 8]:
    print(i)
    print(lis[i])    #this will result in "index out of range" because there is no item in `lis[3]`
Answered By: perpetualstudent

The line 33 in the csv is the issue
Energy or investment product

You can either remove that line and rerun the code. An approach mentioned by @perpetualstudent will also work, but in case if the data has a list of values separated by comma your data will be polluted. It is always a good practice to sanitize your data and reasonable checks like checking if first column is a date etc.

Github flagging the issue:
enter image description here

Answered By: code_aash
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.